How can I verify that javascript and images are being cached? - browser

I want to verify that the images, css, and javascript files that are part of my page are being cached by my browser. I've used Fiddler and Google Page Speed and it's unclear whether either is giving me the information I need. Fiddler shows the HTTP 304 response for images, css, and javascript which should tell the browser to use the cached copy. Google Page Speed shows the 304 response but doesn't show a Transfer Size of Zero, instead it shows the full file size of the resource. Note also, I have seen Google Page Speed report a 200 response but then put the word (cache) next to the 200 (so Status is 200 (cache)), which doesnt make a lot of sense.
Any other suggestions as to how I can verify whether the server is sending back images, css, javascript after they've been retrieved and cached by a previous page hit?

In browser HTTP debuggers are probably the easiest to use in your situation. Try HTTPFox for Firefox or Opera which has dragonfly built-in. Both of these indicate when the local browser cache has been used.
If you appear to be getting conflicting information, then wireshark/tcpdump will show you if the objects are being downloaded or not as it is monitoring the actual network packets being transmitted and received. If you haven't looked at network traces before, this might be a little confusing at first.

In fiddler, check out that the response body (for images, css) is empty. Also make sure your max-age is long enough in Cache-Control header. Most browsers (Safari, Firefox) have good traffic analyzer tools.

Your servers access logs can give you a lot of information on how effective your caching strategy is.
Lets say you have a html page /home.html, which references /some.js and /lookandfeel.css. For a given time period, aggregate the number of requests to all three files.
If your caching is effective, you should see a huge number of requests for home.html, but very few for the css or js. Somewhere in between is when you see identical number of requests for all 3, but the css and js have 304s. The worst is when you are only seeing 200s.
Obviously, you have to know your application to do such a study. The js and css files may be shared across multiple pages - which may complicate your analysis. But the general idea still holds good.
The advantage of such a study is that you can find out how effective your caching strategy is for your users as opposed to 'Is caching working on my machine'. However, this is no substitute for using a http proxy / fiddler.

A HTTP/304 response is forbidden to have a body. Hence, the full-response isn't sent, instead you just get back the headers of the 304 response. But the round-trip itself isn't free, and hence sending proper expiration information is a good practice to improve performance to avoid making the conditional request that returns the 304 in the first place.
http://www.fiddler2.com/redir/?id=httpperf explains this topic in some detail.

Related

Express response interception. Use body to modify the outgoing headers

I would like to try and improve site render times by making use of preload/push headers.
We have various assets which are required up front that I would like to preload, and various assets which are marked up in data attributes etc which will be required later via JS but not for initial paint. It would be good to get these flowing to the client early.
Our application is a bit of a hybrid, it uses http-proxy-middleware connected to various different applications, plus directly renders pages it self. I would like the middleware to be agnostic and work regardless of how to page is produced.
I've seen express-mung but this doesn't hold back the header so executes too late, and works with chunked buffers anyway not the entire response. Next up was express-interceptor, that works perfectly for pages rendered directly in express but causes request failures for pages run through the proxy. My next best idea is pulling apart the compression module to figure out how it works.
Does anyone have a better suggestion, or even better know of a working module for this kind of thing?
Thanks.

Status code without requiring webRequest permission?

I'm writing a Chrome extension to fallback to Wayback Machine when a link fails.
webNavigation seems sufficient for the DNS-lookup case, but I don't see a way to detect link failure with only webNavigation in general.
For example, http://www.google.com/adasdasdasdasdasdasd is a 404 link - but I still get webNavigation onDOMContentLoaded and onCompleted, without indication of HTTP error (no onErrorOccurred is sent).
I was really hoping to avoid needing the webRequest permission with wide-open host patterns. Is there a way to detect HTTP failure that I'm missing?
Send a XMLHttpRequest HEAD request in onBeforeNavigate and analyze the response status code in onreadystatechange callback. If it's 404 then use chrome.tabs.update to change the tab url.
The drawback of sending an additional request for every page is insignificant since web pages usually generate a lot more requests while loading.

Serving images based on :foo in URL

I'm trying to limit data usage when serving images to ensure the user isn't loading bloated pages on mobile while still maintaining the ability to serve larger images on desktop.
I was looking at Twitter and noticed they append :large to the end of the url
e.g. https://pbs.twimg.com/media/CDX2lmOWMAIZPw9.jpg:large
I'm really just curious how this request is being handled, if you go directly to that link there is no scripts on the page so I'm assuming it's done serverside.
Can this be done using something like Restify/Express on a Node instance? More than anything I'm really just curious how it is done.
Yes, it can be done using Express in Node. However, it can't be done using express.static() since it is not a static request. Rather, the receiving function must handle the request by parsing the querystring (or whatever :large is) in order to dynamically respond with the appropriate image.
Generally the images will have already been pre-generated during the user-upload phase for a set of varying sizes (e.g. small, medium, large, original), and the function checks the querystring to determine which static request to respond with.
That is a much higher-performing solution than generating the appropriately-sized image server-side on every request from the original image, though sometimes that dynamic approach is necessary if the server is required to generate a non-finite set of image sizes.

What's the easiest way to request a list of web pages from a web server one by one?

Given a list of URLs, how does one implement the following automated task (assuming windows and ubuntu are the available O/Ses)? Are there existing types of tools that can make implementing this easier or do this out of the box?
log in with already-known credentials
for each specified url
request page from server
wait for page to be returned (no specific max time limit)
if request times out, try again (try x times)
if server replies, or x attempts failed, request next url
end for each
// Note: this is intentionally *not* asynchronous to be nice to the web-server.
Background: I'm implementing a worker tool that will request pages from a web server so the data those pages need to crunch through will be cached for later. The worker doesn't care about the resulting pages' contents, although it might care about HTML status codes. I've considered a phantom/casper/node setup, but am not very familiar with this technology and don't want to reinvent the wheel (even though it would be fun).
You can request pages easily with the http module.
Here's an example.
Some people prefer the request module available in npm.
Here's a link to the github page
If you need more than that, you can use phantomjs.
Here's a link to the github page for bridging node and phantom
However, you could also look for simple cli commands for making requests such as wget or curl.

How does the browser know a web page has changed?

This is a dangerously easy thing I feel I should know more about - but I don't, and can't find much around.
The question is: How exactly does a browser know a web page has changed?
Intuitively I would say that F5 refreshes the cache for a given page, and that cache is used for history navigation only and has an expiration date - which leads me to think the browser never knows if a web page has changed, and it just reloads the page if the cache is gone --- but I am sure this is not always the case.
Any pointers appreciated!
Browsers will usually get this information through HTTP headers sent with the page.
For example, the Last-Modified header tells the browser how old the page is. A browser can send a simple HEAD request to the page to get the last-modified value. If it's newer than what the browser has in cache, then the browser can reload it.
There are a bunch of other headers related to caching as well (like Cache-Control). Check out: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
Don't guess; read the docs. Here's a friendly, but authoritative introduction to the subject.
Web browsers send HTTP requests, and receive HTTP responses. They then displays the contents of the HTTP responses. Typically the HTTP responses will contain HTML. And many HTML elements may need new requests to receive the various parts of the page. For example each image is typically another HTTP request.
There are HTTP headers that indicate if a page is new or not. For example the last modified date. Web browsers typically use a conditional GET (conditional header field) or a HEAD request to detect the changes. A HEAD request receives only the headers and not the actual resource that is requested.
A conditional GET HTTP request will return a status of 304 Not Modified if there are no changes.
The page can then later change based on:
User input
After user input, changes can happen based on javascript code running without a postback
After user input, a new request to the server and get a whole new (possibly the same) page.
Javascript code can run once a page is already loaded and change things at any time. For example you may have a timer that changes something on the page.
Some pages also contain HTML tags that will scroll or blink or have other behavior.
You're getting along the right track, and as Jonathan mentioned, nothing is better than reading the docs. However, if you only want a bit more information:
There are HTTP response headers that let the server set the cacheability of a page, which falls into your expiration date system. However, one other important construct is the HTTP HEAD request, which essentially retrieves the MIME Type and Content-Length (if available) for a given page. Browsers can use the HEAD request to validate what is in their caches...
There is definitely more info on the subject though, so I would suggest reading the docs...

Resources