Browser Cache Control, Dynamic Content

Browser Cache Control, Dynamic Content - browser

Problem: I can't seem to get FireFox to cache images sent from a dynamic server
Setup: Static Apache Server with reverse proxy to a dynamic server (mod_perl2) at backend.
Here is the request URL for the server. It is sent to the the dynamic server, where the cookie is used to validate access to the image:
Request Headers
Host: <OBSCURED>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.15) Gecko/2009102815 Ubuntu/9.04 (jaunty) Firefox/3.0.15
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: <OBSCURED>
Cookie: pz_cred=4KCNr0RM15%2FJCOt%2BEa6%2BL62z%2Fxvbp2xNQHY5pJw5d6Q
Pragma: no-cache
Cache-Control: no-cache
The dynamic server streams the image back to the server, and provides the following response:
Response Headers
Date: Tue, 24 Nov 2009 04:28:07 GMT
Server: Apache/2.2.11 (Ubuntu) mod_apreq2-20051231/2.6.0 mod_perl/2.0.4 Perl/v5.10.0
Cache-Control: public, max-age=31536000
Content-Length: 25496
Content-Type: image/jpeg
Via: 1.1 127.0.1.1:8081
Keep-Alive: timeout=15, max=75
Connection: Keep-Alive
So far, so good (me thinks). However, on reload of the page, the image does not appear cached, and a request is again sent:
Request Headers
Host: <OBSCURED>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.15) Gecko/2009102815 Ubuntu/9.04 (jaunty) Firefox/3.0.15
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: <OBSCURED>
Cookie: pz_cred=4KCNr0RM15%2FJCOt%2BEa6%2BL62z%2Fxvbp2xNQHY5pJw5d6Q
Cache-Control: max-age=0
It doesn't seem that request should happen as the browser should have cached the image. As it is, a 200 response is received, same as the first, and the image appears to be re-fetched (although the browser does appear to be using the cached images).
The problem appears to be hinted at by the Cache-Control: max-age=0 in the reload request header, above.
Does anyone know why this is happening? Perhaps it is the Via header in the response that is causing the problem?

The original request has
Cache-Control: no-cache
which tells all the intermediate HTTP caches (including Firefox's) that you don't want to use a cached response, you want to get the response from the origin web server itself.
The response says:
Cache-Control: public, max-age=31536000
which tells everyone that as far as the origin server is concerned, the response may be cached. The server seems to be configured to enable the PNG image to be cached: HTTP 1.1 (section 14.21) says:
Note: if a response includes a
Cache-Control field with the max-age
directive (see section 14.9.3), that
directive overrides the Expires field.
Your second request says:
Cache-Control: max-age=0
which tells all the intermediate HTTP caches that you won't take any cached response older than 0 seconds.
One thing to watch out for: if you hit the Reload button in Firefox, you are asking to reload from the origin web server. To test the caching of the image, navigate away from the page and back, or open it up in a new tab. Not sure why you saw no-cache the first time and max-age=0 the second though.
BTW, I like the FireBug plug-in for Firefox. You can take a look at the request and response headers with it and all sorts of other good stuff.

My previous answer was only partially correct.
The problem is the way FireFox 3 handles reload events. Apparently, it almost always requests content again from the origin server. Thus the Cache-Control: max-age=0 request header.
Firefox does use cached images to render a page on reload, but then it still makes all the requests to update them "in the background". It then replace them as they come in.
Therefore, the page renders fast, YSlow reports cached content. But the server is still getting nailed.
The resolution is to interrogate the incoming headers in the dynamic server script and determine if a 'If-Modified-Since' header is provided. If this is the case, and it is determined the content has not changed, an HTTP_NOT_MODIFIED (304) response is returned.
This is not optimal -- I'd rather Firefox not make the requests at all -- but it cuts the page load time in half, and greatly reduces bandwidth. Given the way Firefox works on reload, this appears the best solution.
Other Comments: Jim Ferran's point about navigating away from page and returning has merit -- the cache is always used, and no requests are outgoing (+1 to Jim). Also, content that is dynamically added (e.g. AJAX calls after the initial load) appear to use the cache as well.
Hope this helps someone besides me :)

Looks like solved it:
Removed the proxy via header
Added a Last-Modified header
Added a far-future expires date
Firebug still shows 200 responses from the origin server, however, YSlow recognizes the images as cached. According to YSlow, total image download size when fresh is greater than 500K; with the cache primed, it shows 0K download size.
Here is the response header from the Origin server which does the trick:
Date: Tue, 24 Nov 2009 08:54:24 GMT
Server: Apache/2.2.11 (Ubuntu) mod_apreq2-20051231/2.6.0 mod_perl/2.0.4 Perl/v5.10.0
Last-Modified: Sun, 22 Nov 2009 07:28:25 GMT
Expires: Tue, 30 Nov 2010 19:00:25 GMT
Content-Length: 10883
Content-Type: image/jpeg
Keep-Alive: timeout=15, max=89
Connection: Keep-Alive
Because of the way I'm requesting the images, it really should not matter if these dates are static; my app knows the last mod time before requesting the image and appends this to the request URL on the client side to create a unique URL for each image version, e.g. http://myserver.com/img/125.jpg?20091122 (the info comes from a AJAX JSON feed). I could, for example, make the last modified date 01 Jan 2000, and the Expires date sometime in the year 2050.
If YSlow is correct -- and performance testing implies it is -- then FireBug should really report these local cache hits instead of a 200 response.

Related

Issue with CDN strong caching without Cache-Control or Expires headers

We're using Azure CDN to serve images, and I'm trying to understand why images are being strong-cached by web browsers even though there is no Cache-Control or Expires headers in the image response.
For example, the following response headers are returned for an image from Azure CDN:
HTTP/1.1 200 OK
Accept-Ranges: bytes
Access-Control-Allow-Origin: *
Content-MD5: KuQCJm6GQyEjejWwmRgRwQ==
Content-Type: image/jpeg
Date: Tue, 21 Nov 2017 00:15:57 GMT
Etag: 0x8D523228F0F4C42
Last-Modified: Sat, 04 Nov 2017 01:22:47 GMT
Server: ECAcc (meb/A744)
X-Cache: HIT
x-ms-blob-type: BlockBlob
x-ms-lease-status: unlocked
x-ms-request-id: 00822b7c-001e-0045-4194-61d246000000
x-ms-version: 2009-09-19
Content-Length: 5143
<<image data>>
As you can see there is an Etag header returned, but no Cache-Control or Expires headers.
When tracing the network traffic (using Fiddler) from the browser (Chrome), we are NOT seeing any subsequent requests for these images.
My understanding of Etags is that subsequent requests for this image should be sent back to the server (weak caching), and then the server can return a 304 not modified response if the file has not changed.
Can anyone shed any light on this?

I think you need the header cache-control: must-revalidate to get the browser to check the source and have 304 mot modified returned if there is no change.
This is not optimal in terms of caching though.
You are better to invalidate the js with QS changes ("v=??") or set a short expires / max-age header (60 / 120 seconds, or whatever you can handle in terms of deployment, 5 minutes???).
Having an expires header combined with etags should still mean the browser receives a 304 not modified from the server after expiration.

Where can I see Server http header?

One of our applications is tested by Whitehat Sentinel and one of their findings was that in some cases our response header for Server is set to:
Microsoft-HTTPAPI/2.0
I have tried accessing the URL they identified with Postman and Fiddler but I do not see the Server header. I have also tried an online web sniffer http://web-sniffer.net/
Can someone advise how I can see this header?
In Chrome Network tab I see these headers
HTTP/1.1 404 Not Found
Content-Type: text/html
Strict-Transport-Security: max-age=300
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
Date: Thu, 13 Jul 2017 13:59:15 GMT
Content-Length: 1245
No Server header.
The URL reported by Whitehat was not working for me, I changed the target URL to domain.com/%% and this caused the request to be handled by http.sys and it returned the Server attribute.

That is not the name of the header. That is the value found in the Server header when an application serves files over HTTP via http.sys, which is the kernel-mode HTTP server built into Windows.
For example, when serving a file via a C# HttpListener, I get this header:
Server: Microsoft-HTTPAPI/2.0
This header can be disabled by setting the following registry value:
Key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HTTP\Parameters
Name: DisableServerHeader
Type: DWORD
Value: 1

CORS Header Missing on Angular Resource Requests Only

I have a working node/express backend running on localhost. I'm building a project application that needs to fetch data from goodreads api. When I execute the request, I get:
Cross-Origin Request Blocked:
The Same Origin Policy disallows reading the remote resource at
https://www.goodreads.com/book/title.json?author=Arthur+Conan+Doyle&key=[my_key]&title=Hound+of+the+Baskervilles.
(Reason: CORS header 'Access-Control-Allow-Origin' missing).1 <unknown>
Server side, everything is working correctly. I have enabled CORS, and when I check the headers, 'Access-Control-Allow-Origin' is available on everything coming from my server after checking the header in Firefox and Chrome dev tools. When I make a request via $resource, however, 'Allow-Access...' is not present in my header. Here is the code for the resource:
.factory('goodReads', function($resource) {
return $resource('https://www.goodreads.com/book/title.json');
})
.controller('AddBookSelectorController', function($resource, goodReads) {
this.fetch = function() {
var key = '[my_key]';
var data = goodReads.query({author: 'Arthur Conan Doyle', key: key, title: 'Hound of the Baskervilles'});
console.log(data);
};
});
I'm calling fetch via ng-click, and everything executes fine except I get the CORS error. Can anyone point me in the right direction? I am new to angular, and my suspicion is there is a problem with my resource request or something in configuration, but I can't seem to find an answer to fix my problem in the documentation or other stackoverflow questions.
Update 3: It is not a localhost issue. I tried pushing it to my domain and using a simple button which ran an xhr request to the OpenBooks api, and the problem got worse. It is hosted via Openshift, and now the 'Allow-Control-Access-x' headers are gone even for other files on my server. Really beginning to bang my head against the wall here. I am removing the Angular tags, because it has nothing to do with Angular.
UPDATE 2: I got it working after installing 'Allow-Control-Allow-Origin' extension in Chrome. Has my problem been the fact that I'm running this on localhost? Or is there something else going on? The header is still not being set without the extension.
UPDATE: I've been working on this since 8am, and still no luck. I have tried rewriting the request using Angular's $http and also with Javascript's xhr following the example from HTML5 Rocks | Using Cors and I'm still having the same problem with each method. Like I said, the necessary header information is available from files on my server, but it breaks when I make requests to other sites.
I'm starting to think this might not be an Angular problem, but I really have no clue. Just to be safe, here is the code I added to Express to enable CORS, including app.use so you can get an idea for where I called it:
app.use(logger('dev'));
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: false }));
app.use(cookieParser());
app.use(function(req, res, next) {
res.header("Access-Control-Allow-Origin", "*");
res.header("Access-Control-Allow-Headers", "Origin, X-Requested-With, Content-Type, Accept, Authorization, Content-Length");
res.header("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS");
res.header("Access-Control-Allow-Credentials", "true");
next();
});
app.use(express.static(path.join(__dirname, 'public')));
app.use('/', routes);
Edit: Here are the headers from the API request:
Request Headers
Host: www.goodreads.com
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0
Accept: application/json, text/plain, */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Referer: http://localhost:3000/
Origin: http://localhost:3000
Connection: keep-alive
Response Headers
Cache-Control: max-age=0, private, must-revalidate
Content-Encoding: gzip
Content-Length: 686
Content-Type: application/json; charset=utf-8
Date: Wed, 02 Sep 2015 17:20:35 GMT
Etag: "a2be782f32638d2a435bbeaf4b01274a-gzip"
Server: Server
Set-Cookie: csid=BAhJIhg1MzgtNTk4NjMzNy0wNzQ4MTM5BjoGRVQ%3D--afed14b563e5a6eb7b3fa9005de3010474230702; path=/; expires=Sun, 02 Sep 2035 17:20:33 -0000
locale=en; path=/
_session_id2=fd45336b8ef86010d46c7d73adb5f004; path=/; expires=Wed, 02 Sep 2015 23:20:35 -0000; HttpOnly
Status: 200 OK
Vary: Accept-Encoding,User-Agent
X-Content-Type-Options: nosniff, nosniff
X-Frame-Options: ALLOWALL
X-Request-Id: 1K8EJWG30GWDE4MZ4R5K
X-Runtime: 2.277972
X-XSS-Protection: 1; mode=block
Headers for the .js file from my server:
Request
Host: localhost:3000
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Referer: http://localhost:3000/
Cookie: _ga=GA1.1.1924088292.1439681064; connect.sid=s%3AB4O0Up9WF5iqkfky__I0XCiBD2aMATlq.gbJUC9GseqnJvRTEIbcwxD6cwFQeL7ljNScURCJ5As0
Connection: keep-alive
If-Modified-Since: Wed, 02 Sep 2015 17:08:40 GMT
If-None-Match: W/"886-14f8f0828c1"
Cache-Control: max-age=0
Response:
Accept-Ranges: bytes
Access-Control-Allow-Origin: *
Cache-Control: public, max-age=0
Connection: keep-alive
Date: Wed, 02 Sep 2015 17:20:30 GMT
Etag: W/"886-14f8f0828c1"
Last-Modified: Wed, 02 Sep 2015 17:08:40 GMT
X-Powered-By: Express
access-control-allow-headers: Origin, X-Requested-With, Content-Type, Accept

I guess this problem exposed my ignorance, but maybe this will help other newbies to CORs like me. I finally figured out the problem after getting a copy of CORs in Action and working through the first example using the Flickr API.
My problem had nothing to do with the backend, Angular, jQuery's .ajax method, or xhr. All of my requests were properly formatted. The problem was the APIs I attempted to use did not have CORs enabled on their server. O.o As soon as I changed the data type to jsonp, everything went through.
Anyway, for you newbs out there like me, here are some pointers to help you if you run into this problem:
1. Don't assume the API you are using has CORs enabled
I don't know why, but I blindly picked two APIs that don't have CORs enabled, which is what caused all the fuss for me. I have never run into this problem before because the work I have done with APIs have always been from big companies like Flickr that had CORs enabled. If they don't set Access-Control-Allow-Origin on their server, you can request them to enable it and use JSONP in the meantime.
If the API has an option for a callback at the end, that's a good sign you should use JSONP for your request. JSONP works by wrapping your request in a callback and exploiting a feature of the script tag. Scripts can pull other scripts from any domain, so it works as a hack to get the data. Here's a good link that helped me. Exactly What is JSONP? | CameronSpear.com
2. Check The Response Headers
I got tricked by this, but remember that the response header on your request to an external API is the response from their server, not yours. It doesn't matter if CORs is enabled on your server, you are making the request to someone else, and the browser automatically sends your information to them in the request header. Remember, all of this checking is done by the browser for security reasons, so its doing the heavy lifting for you on the request side based on your ajax call. If Access-Control-Whatever doesn't show up in the response header, they don't have CORs enabled. If you are working on the frontend and requesting someone else's data, you can't do anything about it. Use JSONP and your problems will disappear (probably).
This whole fiasco for me started because I was confusing responses coming from my server with responses coming for their server. I correctly enabled CORs on my own server, but I was thinking it wasn't attaching the origin information to the request header which is why it was absent in the response header. In reality, everything was working correctly, but the API server didn't have it enabled.
So a day spent, but many lessons learned. Hopefully my wasted time helps someone else with their CORs problems. Note that my issue was stack agnostic, so regardless of how you are making your request, checking the response header is the first course of action to take if you run into a problem with CORs. After that, I would suggest looking into the request itself for errors.
Check out that book above or this link from the same author for more help, especially when it comes to non-simple requests HTML5 Rocks | Using CORs.

CouchDB Access Denied Redirect

When I punch in the URL for a secured database it displays the following message on the page:
{"error":"unauthorized","reason":"You are not authorized to access this db."}
Although this message certainly gets the point across I would prefer to redirect the user to a different page, like a login page. I've tried changing the authentication_redirect option in the couch config but no success. How would I go about doing this?

Authentication redirect is only works if client explicitly accepts text/html content type (e.g. sends Accept: text/html header):
GET /db HTTP/1.1
Accept: text/html
Host: localhost:5984
In this case, CouchDB will send HTTP 302 response instead of HTTP 401 which redirects on your authentication form, specified with authentication_redirect configuration option:
HTTP/1.1 302 Moved Temporarily
Cache-Control: must-revalidate
Content-Length: 78
Content-Type: text/plain; charset=utf-8
Date: Tue, 24 Sep 2013 01:32:40 GMT
Location: http://localhost:5984/_utils/session.html?return=%2Fdb&reason=You%20are%20not%20authorized%20to%20access%20this%20db.
Server: CouchDB/1.4.0 (Erlang OTP/R16B01)
{"error":"unauthorized","reason":"You are not authorized to access this db."}
Othewise CouchDB doesn't know was request send by human from browser or by application library. In the last case redirecting to the HTML form instead of raising HTTP error isn't suitable solution.

Self-Coded Proxy cannot retrieve image from wikipedia

I'm trying to write a small proxy server in c#. It is working nicely for many webpages I tested (including google.com and microsoft.com). For testing I started my proxy server and configured IE 10 on Windows 8 to use it.
But when I try wikipedia.org it does only load the main page but no pictures. I tried to load a single picture (http://upload.wikimedia.org/wikipedia/commons/6/63/Wikipedia-logo.png). When I use IE without proxy it works, but with the proxy I get a 404 response.
This is the GET Request which IE (my proxy just forwards it) issues:
GET http://upload.wikimedia.org/wikipedia/commons/6/63/Wikipedia-logo.png HTTP/1.1
Accept: text/html, application/xhtml+xml, */*\
Accept-Language: de-CH\
User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)
Accept-Encoding: gzip, deflate
Host: upload.wikimedia.org
DNT: 1
Proxy-Connection: Keep-Alive
IMHO it looks correct. This is the response I get (omited some html tags):
HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
X-Varnish: 1427845074 1427806476, 274786836, 3671934588
Via: 1.1 varnish, 1.1 varnish, 1.1 varnish
Content-Length: 262
Accept-Ranges: bytes
Date: Mon, 01 Jul 2013 21:30:54 GMT
Age: 28
Connection: keep-alive
X-Cache: cp1063 hit (1), cp3004 miss (0), cp3003 frontend miss (0)
Access-Control-Allow-Origin: *
...404 Not Found\n The resource could not be found.\nRegexp failed to match URI: "http:/upload.wikimedia.org/wikipedia/commons/6/63/Wikipedia-logo.png"
The strange part is here:
Regexp failed to match URI: "http:/upload.wikimedia.org/wikipedia/commons/6/63/Wikipedia-logo.png"
-> the URL starts with a http:/
In the code I connect to uploads.wikimedia.org like this:
// connect to uploads.wikimedia.org
ServerSocket.Connect(RemoteHost, 80);
byte[] SendBuffer = Request.ToArray();
// send the clients request to the server
ServerSocket.Send(SendBuffer);
I have no idea why it doesn't work. Any help is appreciated. My full code is located on Github: Proxy_C_Sharp

I just found out why.
According to the HTTP/1.1 specification (http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5) in Chapter 5.2.1:
"To allow for transition to absoluteURIs in all requests in future versions of HTTP, all HTTP/1.1 servers MUST accept the absoluteURI form in requests, even though HTTP/1.1 clients will only generate them in requests to proxies."
I tried it out with a small tool. if I make a request like this:
GET /wikipedia/commons/6/63/Wikipedia-logo.png HTTP/1.1
Host: upload.wikimedia.org
It works. So the reason is that Wikipedia is not conform to the standard. It should accept absolute urls. But it works if I visit the site without a proxy because the browser uses absolute URIs only with proxies. If there is no proxy configured it uses a relative one.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string