Browser behavior on history back and forward - browser

I would like to understand a particular browser behavior in regard to the back and forward buttons. I always believed that in a "traditional" navigation, clicking the back and forward buttons caused the page to reload, unless we were just changing the hash on the same page (can't recall now if there are other obvious exceptions).
Lately I've observed a behavior that shattered those believes: changing page caused the browser to save a "snapshot" of the last state the page was before navigating away, so whenever I pressed the back button it would take me to the exact moment I navigated away.
While this can be a very useful behavior the problem is that this way I'm showing the user potentially outdated data.
Same behavior happens if I press the forward button: I will land to the next page I visited at the exact state it was when I pressed the back button.
So I spent a little time and started testing different browsers and different domains using the same application and applying the same conditions to come to the following result:
Chrome 102: this behavior happened when navigating on a public domain on the Internet with Dev Tools closed. Whenever I tried it in localhost or with Dev Tools open the reloading happened.
Safari 15: this behavior happened always, didn't matter wether I was on localhost or I had Dev Tools open.
Opera 88: this behavior happened when navigating on a public domain on the Internet regardless the presence of Dev Tools. When navigating on localhost it reloaded the pages.
Firefox 101 and Edge 102 didn't exhibit this behavior, they simply reloaded the pages when navigating.
Is there any explanation or documentation for this difference of behavior? I tried to search it but nothing meaningful came out, perhaps I'm not using the right words.
Update
I checked headers coming from the responses but I think they're not relevant:
These are localhost's
Connection: keep-alive
Date: Thu, 23 Jun 2022 08:58:01 GMT
ETag: W/"57cf-+fmYAdB1w1b8e3hCMCGT2IIRFCg"
Keep-Alive: timeout=5
For the public domains relevant headers are
cache-control: max-age=0
date: Thu, 23 Jun 2022 10:10:23 GMT
expires: Thu, 23 Jun 2022 10:10:23 GMT
Which tell the browser to not cache the page, and that's why I didn't add them in the original question.

Related

How do I debug broken response headers in Apache?

We're trying to get Webdav running in Kubernetes using an Azure Files storage backend, which is mounted in the container on /dav/data. The container itself is running Alpine Linux 3.12.1, in which we're installing all our required Apache packages.
All is well and good when not mounting this storage, or when using a different storage backend. However, when mounting the Azure Files storage, stuff starts to break.
Uploading files works without issue, but downloading does not; most software complains about invalid HTTP headers/responses. When investigating this further, I see that the beginning of the headers seems to be getting cut off.
Example headers of a correct response (obtained by not mounting the volume):
HTTP/1.1 200 OK
Date: Tue, 01 Dec 2020 13:54:53 GMT
Server: Apache/2.4.46 (Unix)
Last-Modified: Tue, 01 Dec 2020 13:51:02 GMT
ETag: "bla"
Accept-Ranges: bytes
Content-Length: 985
Connection: close
Example headers of an incorrect response:
s: bytes
Content-Length: 985
Connection: close
Everything up to the first s in the Accept-Ranges header seems to be getting eaten somewhere. There also seem to be a number of extra null bytes at the end of the response.
In an effort to get to the bottom of this I looked into logging as much as I possibly could, and stumbled upon the DumpIO module, which would allow me to log both the response headers as well as the body. For some reason, loading this module, setting DumpIOOutput On and LogLevel dumpio:trace7 actually fixes the issue. Response headers are fine, and the response body is exactly what you'd expect. And it's driving me nuts.
I suspect there's some kind of weird buffer/window issue being caused by an interaction between Apache and the mounted volume, but I haven't been able to figure out what.
We've since changed the storage backend used for the volume, but I'd still really like to know what caused this issue.
I've also been able to reproduce this locally in Docker.
Having the exact same issue after upgrading the AKS cluster from 1.17 to 1.18. Headers are malformed. Tried updating to a newer Apache version but doesn't work. Temporarily switched from azureFile to azureDisk and that works! Will see if I can create an AKS bug report for this
In case you haven't come across the culprit yet (or anyone else coming across this while googling like I did). The issue is Apache's EnableMMAP setting.
Solution: In your Apache conf, set EnableMMAP off.
Sources:
https://cloudiseasy.com/2021/06/13/deploying-apache-server-on-aks-with-azure-files/
https://httpd.apache.org/docs/2.4/mod/core.html#enablemmap
Apache is adding header to images resulting in corrupting images

Varnish and WordPress, it is possible real caching without external plugin?

Maybe it sounds a novice question in Varnish Cache world, but why in WordPress it seems that is a need to install a external cache plugin, to working fully cached?
Websites are correctly loaded via Varnish, a curl -I command:
HTTP/1.1 200 OK
Server: nginx/1.11.12
Date: Thu, 11 Oct 2018 09:39:07 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Vary: Accept-Encoding
Cache-Control: max-age=0, public
Expires: Thu, 11 Oct 2018 09:39:07 GMT
Vary: Accept-Encoding
X-Varnish: 19575855
Age: 0
Via: 1.1 varnish-v4
X-Cache: MISS
Accept-Ranges: bytes
Pragma: public
Cache-Control: public
Vary: Accept-Encoding
With this configuration, by default WordPress installations are not being cached.
After test multiple cache plugins -some not working, or not working without complex configuration- i found the Swift Performance, in their Lite version, simply activating the Cache option, here really takes all advantages and here i can see varnish is working fully with very good results in stress test.
This could be ok for a single site on a single environment, but in shared hosting terms, when every customer can have their own WP (or other CMS) installation could be a problem.
So the key is there are no way to take full caching advantage from Varnish without installing 3rd party caching (and complex) plugins? Why not caching all by default?
Any kind of suggestions and help will be high welcome, thanks in advance.
With this configuration, by default WordPress installations are not being cached
By default, if you don't change anything in neither Wordpress or Varnish configuration, things would work together in a way that Wordpress pages are cached for 120 seconds. So real caching is possible, but it will be a short lived cache and highly ineffective one.
Your specific headers indicate that no caching should happen. They are either sent by Varnish itself (we're all guilty of copy pasting stuff without thinking what it does), or a Wordpress plugin (more often bad ones, than good). Without knowing your specific configuration, it's hard to decipher anything.
Varnish is a transparent HTTP caching proxy. Which means it’s just going to, by default, use HTTP headers, which are sent by backend (Wordpress), like Cache-Control, to make a decision on whether resource can be cached and for how long.
Wordpress, in fact, does not send cache related headers other than in a few specific areas (error pages, login POST submission, etc).
The standard approach outlined here is configuring Varnish with the highest TTL. With that:
Varnish has no idea when you update an article contents, or change theme. Typical solution to this lies in using cache invalidation plugin like Varnish HTTP Purge.
A plugin requirement comes from necessity to purge cache, when content is changed.
Suppose that you update a Wordpress page's text. You had that same page previously visited and it went into Varnish cache for storage. What happens upon the next visit, is that Varnish will serve the same, now stale content to all the next visitors.
The Wordpress plugins for Varnish, like Varnish HTTP Purge, will hook into Wordpress in a way that they will instruct Varnish to clear cache when pages are updated. This is their primary purpose.
That kind of approach (high TTL and cache purging) is de-facto standard with Varnish. As Varnish has no information about when you update content, the inner workings of purging cache is with the application itself. The cache purging feature is either bundled into CMS code itself (Magento 2, for example has it out of the box, without any extra plugins), or a Wordpress plugin.

Is Chrome ignoring Cache-Control: max-age?

Background:
IIS 7
AspNet 3.5 web app
Chrome dev tools lists 98 requests for the home page of the web app (aspx + js + css + images). In following requests, status code is 200 for css/images files. No cache info, browser asks server each time if file has to be updated. OK.
In IIS 7 I set HTTP header for cache control, set to 6 hours for the "ressources" folder. In Chrome, using dev tools, I can see that header is well set in response:
Cache-Control: max-age=21600
But I still get 98 requests... I thought that browser should not request one ressource if its expiration date is not reached, and I was expecting the number of requests to drop...
I got it. Google Chrome ignores the Cache-Control or Expires header if you make a request immediately after another request to the same URI in the same tab (by clicking the refresh button, pressing the F5 key or pressing Command + R). It probably has an algorithm to guess what does the user really want to do.
A way to test the Cache-Control header is to return an HTML document with a link to itself. When clicking the link, Chrome serves the document from the cache. E.g., name the following document self.html:
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Test Page</title>
</head>
<body>
<p>
Link to the same page.
If correctly cached, a request should not be made
when clicking the link.
</p>
</body>
</html>
Another option is to copy the URL and paste it in the same tab or another tab.
UPDATE: On a Chrome post published on January 26, 2017, it is described what was the previous behavior and how it is changing by doing only revalidation of the main resource, but not of the sub-resources:
Users typically reload either because a page is broken or the content seems stale. The existing reload behavior usually solves broken pages, but stale content is inefficiently addressed by a regular reload, especially on mobile. This feature was originally designed in times when broken pages were quite common, so it was reasonable to address both use cases at once. However, this original concern has now become far less relevant as the quality of web pages has increased. To improve the stale content use case, Chrome now has a simplified reload behavior to only validate the main resource and continue with a regular page load. This new behavior maximizes the reuse of cached resources and results in lower latency, power consumption, and data usage.
In a Facebook post also published on January 26, 2017, it is mentioned that they found a piece of code were Chrome invalidates all cached resources after a POST request:
we found that Chrome would revalidate all resources on pages that were loaded from making a POST request. The Chrome team told us the rationale for this was that POST requests tend to be pages that make a change — like making a purchase or sending an email — and that the user would want to have the most up-to-date page.
It seems this is not the case anymore.
Finally, it is described that Firefox is introducing Cache-Control: immutable to completely stop revalidation of resources:
Firefox implemented a proposal from one of our engineers to add a new cache-control header for some resources in order to tell the browser that this resource should never be revalidated. The idea behind this header is that it's an extra promise from the developer to the browser that this resource will never change during its max-age lifetime. Firefox chose to implement this directive in the form of a cache-control: immutable header.
Chrome appears to be ignoring your Cache-Control settings if you're reloading in the same tab. If you copy the URL to a new tab and load it there, Chrome will respect the cache control tags and reuse the contents from the cache.
As an example I had this Ruby Sinatra app:
#!/usr/bin/env ruby
require 'sinatra'
before do
content_type :txt
end
get '/' do
headers "Cache-Control" => "public, must-revalidate, max-age=3600",
"Expires" => Time.at(Time.now.to_i + (60 * 60)).to_s
"This page rendered at #{Time.now}."
end
When I continuously reloaded it in the same Chrome tab it would display the new time.
This page rendered at 2014-10-08 13:36:46 -0400.
This page rendered at 2014-10-08 13:36:48 -0400.
The headers looked like this:
< HTTP/1.1 200 OK
< Content-Type: text/plain;charset=utf-8
< Cache-Control: public, must-revalidate, max-age=3600
< Expires: 2014-10-08 13:36:46 -0400
< Content-Length: 48
< X-Content-Type-Options: nosniff
< Connection: keep-alive
* Server thin is not blacklisted
< Server: thin
However accessing the same URL, http://localhost:4567/ from multiple new tabs would recycle the previous result from the cache.
After doing some tests with Cache-Control:max-age=xxx:
Pressing reload button: header ignored
Entering same url any tab (current or not): honored
Using JS (window.location.reload()): ignored
Using Developer Tools (with Disable cache unselected) or incognito doesn't affect
So, the best option while developing is put the cursor in the omnibox and press enter instead of refresh button.
Note: a right button click on refresh icon will show refresh options (Normal, Hard, Empty Cache). Incredibly, no one of these affect on these headers.
If Chrome Developer Tools are open (F12), Chrome usually disables caching.
It is controllable in the Developer Tools settings - the Gear icon to the right of the dev-tools top bar.
While this question is old, I wanted to add that if you are developing using a self-signed certificate over https and there is an issue with the certificate then google will not cache the response no matter what cache headers you use.
This is noted in this bug report:
https://bugs.chromium.org/p/chromium/issues/detail?id=110649
This is addition to kievic answer
To force browser to NOT send Cache-Control header in request, open chrome console and type:
location = "https://your.page.com"
To force browser to add this header click "reload" button.
Quite an old question, but I noticed just recently (2020), that Chrome sometimes ignores the Cache-Control headers for my image resources when browsing using an Incognito window.
"Sometimes" because in my case the Cache-Control directive was honored for small images (~60-200KB), but not for larger ones (10MB).
Not using Incognito window resulted in Chrome using the disk cached version even for the large images.
Another tip:
Do not forget to verify "Date" header - if server has incorrect date/time (or is located in another time zone) - Chrome will keep requesting resource again and again.

Reload vs Refresh

I have this script
<?php
header("Expires: Sat, 11 Jun 2011 00:00:00 GMT");
echo "Hello World";
?>
It just writes "Hello World" and set the cache to expire on next Saturday.
Now, when I load this page in FireFox and click on reload button, it makes a new request to server to load the page instead of just serving it from cache (I think to ensure if last-modified is still valid).
However, if I put my cursor on the address bar and press Enter, FireFox serves the contents from cache.
Why is that so? Why does in first case (reload) it makes a request to server, but in second case (refresh, I guess?) it serves from cache?
I think the terms 'refresh' and 'reload' are basically synonymous. I see this line in RFC 2616 that describes HTTP/1.1 caching that provides a possible slight difference:
An expiration time cannot be used to force a user agent to refresh its display or reload a resource
In other words, perhaps you could say refreshing is for displays, and reloading is for resources. But since browsers' primary use for resources is display, I don't see a difference.
Here's a short writeup on the terms by a developer who has dealt with browser cache control. The terms he prefers are these:
load: hit Enter in the address bar; click on links
reload: F5; Ctrl+R; toolbar's refresh button; Menu -> Reload
hard reload: Ctrl+F5; Ctrl+Shift+R
(The hard reload forces the browser to bypass its cache. For Firefox, you hold down Shift and press the reload button. Wikipedia has a list of how to do this for common browsers. You can test its effect on this page.)
To answer your question about how Firefox decides when to refresh, here is how the link from above explains it:
load: no request happens until the cached resource expires
reload: the request contains the If-Modified-Since and Cache-Control: max-age=0 headers that allow the server to respond with 304 Not Modified if applicable
hard reload: the request contains the Pragma: no-cache and Cache-Control: no-cache headers and will bypass the cache
When people refresh a page, they generally expect to see new results, so caching of the entire page doesn't make much sense.

Pocket IE: Still seems to be caching?

I'm having trouble with a particular version of Pocket IE running under Windows Mobile 5.0. Unfortunately, I'm not sure of the exact version numbers.
We had a problem whereby this particular 'installation' would return a locally cached version of a page when the wireless network was switched off. Fair enough, no problem. We cleared the cache of the handheld and started sending the following headers:
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Last-Modified: Thu, 30 Jul 2009 16:42:08 GMT
The Last-Modified header is calculated on the fly and set to 'now'.
Even still, the handheld seems to be caching these pages: the page is sent with the headers but then when they disconnect the wireless network and click a link to the page (that was not supposed to be cached) it still returns this cached file.
Is there some other header/s that should be sent, or is this just a problem with Pocket IE? Or is it possibly something entirely different?
Thanks!
I'm not sure I can answer your question since I have no Pocket IE to test with, but maybe I can offer something that can help.
This is a very good caching reference: http://www.mnot.net/cache_docs/
Also, I'm not sure whether your example is the pasted results of your headers, or the code that you've set up to send the headers, but I believe the collection of headers in most language implementations (and by extension I assume most browser implementations) is treated as a map; therefore, it's possible you've overwritten "no-store, no-cache, must-revalidate" with the second "Cache-Control" header. In other words, only one can get sent, and if last wins, you only sent "post-check=0, pre-check=0".
You could also try adding the max-age=0 header.
In my experience both Firefox and IE have seemed more sensitive to pages served by HTTPS as well. You could try that if you have it as an option.
If you still have no luck, and Pocket IE is behaving clearly differently from Windows IE, then my guess is that the handheld has special rules for caching based on the assumption that it will often be away from internet connectivity.
Edit:
After you mentioned CNN.com, and I realized that you do not have the "private" header in Cache-Control. I think this is what is making CNN.com cache the page but not yours. I believe "private" is the most strict setting available in the "Cache-Control header. Try adding that.
For example, here are CNN's headers. (I don't think listing "private" twice has any effect)
Date: Fri, 31 Jul 2009 16:05:42 GMT
Server: Apache
Accept-Ranges: bytes
Cache-Control: max-age=60, private, private
Expires: Fri, 31 Jul 2009 16:06:41 GMT
Content-Type: text/html
Vary: User-Agent,Accept-Encoding
Content-Encoding: gzip
Content-Length: 21221
200 OK
If you don't have the Firefox Web Developer Toolbar, it's a great tool to check Response Headers of any site - in the "Information" dropdown, "View Reponse Headers" is at the bottom.
Although Renesis has been awesome in trying to help me here, I've had to give up.
By 'give up' I mean I've cheated. Instead of trying to resolve this issue on the client side, I went the server side route.
What I ended up doing was writing a function in PHP that will take a URL and essentially make it unique. It does this by adding a random GET parameter based on a call to uniqid(). I then do a couple of other little things to it: make sure I add a '?' or a '&' to the URL based on the existence of other GET parameters and make sure that any '#' anchor items are pushed right to the end and then I return that URL to the browser.
This essentially resolves the issue as each link the browser ever sees is unique: it's never seen that particular URL before and so can't retrieve it from the cache.
Hackish? Yes. Working? So far, so good.

Resources