How does the browser know a web page has changed? - browser

This is a dangerously easy thing I feel I should know more about - but I don't, and can't find much around.
The question is: How exactly does a browser know a web page has changed?
Intuitively I would say that F5 refreshes the cache for a given page, and that cache is used for history navigation only and has an expiration date - which leads me to think the browser never knows if a web page has changed, and it just reloads the page if the cache is gone --- but I am sure this is not always the case.
Any pointers appreciated!

Browsers will usually get this information through HTTP headers sent with the page.
For example, the Last-Modified header tells the browser how old the page is. A browser can send a simple HEAD request to the page to get the last-modified value. If it's newer than what the browser has in cache, then the browser can reload it.
There are a bunch of other headers related to caching as well (like Cache-Control). Check out: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

Don't guess; read the docs. Here's a friendly, but authoritative introduction to the subject.

Web browsers send HTTP requests, and receive HTTP responses. They then displays the contents of the HTTP responses. Typically the HTTP responses will contain HTML. And many HTML elements may need new requests to receive the various parts of the page. For example each image is typically another HTTP request.
There are HTTP headers that indicate if a page is new or not. For example the last modified date. Web browsers typically use a conditional GET (conditional header field) or a HEAD request to detect the changes. A HEAD request receives only the headers and not the actual resource that is requested.
A conditional GET HTTP request will return a status of 304 Not Modified if there are no changes.
The page can then later change based on:
User input
After user input, changes can happen based on javascript code running without a postback
After user input, a new request to the server and get a whole new (possibly the same) page.
Javascript code can run once a page is already loaded and change things at any time. For example you may have a timer that changes something on the page.
Some pages also contain HTML tags that will scroll or blink or have other behavior.

You're getting along the right track, and as Jonathan mentioned, nothing is better than reading the docs. However, if you only want a bit more information:
There are HTTP response headers that let the server set the cacheability of a page, which falls into your expiration date system. However, one other important construct is the HTTP HEAD request, which essentially retrieves the MIME Type and Content-Length (if available) for a given page. Browsers can use the HEAD request to validate what is in their caches...
There is definitely more info on the subject though, so I would suggest reading the docs...

Related

Status code without requiring webRequest permission?

I'm writing a Chrome extension to fallback to Wayback Machine when a link fails.
webNavigation seems sufficient for the DNS-lookup case, but I don't see a way to detect link failure with only webNavigation in general.
For example, http://www.google.com/adasdasdasdasdasdasd is a 404 link - but I still get webNavigation onDOMContentLoaded and onCompleted, without indication of HTTP error (no onErrorOccurred is sent).
I was really hoping to avoid needing the webRequest permission with wide-open host patterns. Is there a way to detect HTTP failure that I'm missing?
Send a XMLHttpRequest HEAD request in onBeforeNavigate and analyze the response status code in onreadystatechange callback. If it's 404 then use chrome.tabs.update to change the tab url.
The drawback of sending an additional request for every page is insignificant since web pages usually generate a lot more requests while loading.

Detecting where user has come from a specific website

I'm wanting to know if it's possible to detect which website a user has come from and serve to them different content based on which website they have just come from.
So if they've come from any other website on the internet and landed on my page, they will see my normal html and css page, but if they come from a specific website (this specific website would have also been developed by me so I have control over the code server-side and client-side) then I want them to see something slightly different.
It's a very small difference that I want them to see, and that's why I don't want to consider taking them to a different version of the website or a different page.
I'm also not sure if this solution will be placed on the page they coming from or the page that they arriving on?
Hope that's clear. Thanks!
I would add a URL parameter like http://example.com?source=othersite. This way you can easily adjust the parameter and can use javascript to detect this and slightly alter your landing page.
Otherwise, you can use the HTTP referrer sent via the browser to detect where they came from, but you would need to tell us your back end technology to get an example of that, as it differs a bit.
In javascript, you can do something as easy as
if(window.location.href.indexOf('source=othersite') > 0)
{
// alter DOM here
}
Or you can use a URL Parameter parser as suggested here: How to get the value from the GET parameters?
What you want is the Referer: HTTP header. It will give the URL of the page which the user came from. Bear in mind that the Referer can easily be spoofed, so don't take it as a guarantee if security is an issue.
Browsers may disable the referer, though. Why not just use a URL parameter?

How do I override xhr-src?

I created a userscript for myself which is active on all webpages i visit. It sends data to my debugger/app via jquery's post ($.post).
I notice one site not allowing me to send data even though it worked before and after a quick look it appears there is some kind of error via xhr-src. It appears the response headers has a 'X-Content-Security-Policy' which list a bunch of sites (google being one). So when i try to do a post to localhost:myport/ it violates the rule thus doesn't post.
What can I do to get this working again? I can't exactly edit the headers (unless i write my own http proxy?) would i be able to create an iframe using localhost:1234/workaround and post via that? But the issue is i still dont know if thats a violation or how to give it data.

How can I verify that javascript and images are being cached?

I want to verify that the images, css, and javascript files that are part of my page are being cached by my browser. I've used Fiddler and Google Page Speed and it's unclear whether either is giving me the information I need. Fiddler shows the HTTP 304 response for images, css, and javascript which should tell the browser to use the cached copy. Google Page Speed shows the 304 response but doesn't show a Transfer Size of Zero, instead it shows the full file size of the resource. Note also, I have seen Google Page Speed report a 200 response but then put the word (cache) next to the 200 (so Status is 200 (cache)), which doesnt make a lot of sense.
Any other suggestions as to how I can verify whether the server is sending back images, css, javascript after they've been retrieved and cached by a previous page hit?
In browser HTTP debuggers are probably the easiest to use in your situation. Try HTTPFox for Firefox or Opera which has dragonfly built-in. Both of these indicate when the local browser cache has been used.
If you appear to be getting conflicting information, then wireshark/tcpdump will show you if the objects are being downloaded or not as it is monitoring the actual network packets being transmitted and received. If you haven't looked at network traces before, this might be a little confusing at first.
In fiddler, check out that the response body (for images, css) is empty. Also make sure your max-age is long enough in Cache-Control header. Most browsers (Safari, Firefox) have good traffic analyzer tools.
Your servers access logs can give you a lot of information on how effective your caching strategy is.
Lets say you have a html page /home.html, which references /some.js and /lookandfeel.css. For a given time period, aggregate the number of requests to all three files.
If your caching is effective, you should see a huge number of requests for home.html, but very few for the css or js. Somewhere in between is when you see identical number of requests for all 3, but the css and js have 304s. The worst is when you are only seeing 200s.
Obviously, you have to know your application to do such a study. The js and css files may be shared across multiple pages - which may complicate your analysis. But the general idea still holds good.
The advantage of such a study is that you can find out how effective your caching strategy is for your users as opposed to 'Is caching working on my machine'. However, this is no substitute for using a http proxy / fiddler.
A HTTP/304 response is forbidden to have a body. Hence, the full-response isn't sent, instead you just get back the headers of the 304 response. But the round-trip itself isn't free, and hence sending proper expiration information is a good practice to improve performance to avoid making the conditional request that returns the 304 in the first place.
http://www.fiddler2.com/redir/?id=httpperf explains this topic in some detail.

How do I get a web part to refresh in IE?

I have a SharePoint web part that gets XML data from an .ASHX page, parses it and displays it using JavaScript. Everything works fine, until the XML changes. When I view the web part in IE, the new data is not updated until I close the browser. Even doing a CTRL-F5 does not grab the new data.
Firefox displays the new data immediately, with just a simple page refresh.
I have added a timestamp to the query string of my .ASHX page so that the XML result is not cached, but that did not fix my IE woes. Any other ideas?
Edit
The .ASHX page is using the API to access a list and is building the XML string, then returning that as an application/XML content type. I have confirmed that the XML is updated to reflect the new data in the list. I am also able to see the data consumed in the web part when it is displayed in FireFox.
Solution
I actually was generating the timestamp to append to my query in the server code, and then putting that string in the javascript. Once I moved the timestamp code to the javascript, things started working much better.
It's most likely cached, there's no other logical explanation why it would work in Firefox and not in IE. Try reloading IE several times in a row.
Check what headers that .ashx page sets.
It's not just your browser which could be caching the page, any middleware including web server might have a flawed caching implementation. You can also try using HTTP POST instead of GET because according to HTTP specification, POST requests should never be cached.
Is it a custom Web Part or one of the out-of-the-box Web Parts? It would make it easier to help you if you provided any more information on how you're retrieving the data from the HttpHandler (ashx).

Resources