How to display the age of an nginx cached file in headers - linux

I've set up a caching server for a site through nginx 1.6.3 on CentOS 7, and it's configured to add http headers to served files to show if said files came from the caching server (HIT, MISS, or BYPASS) like so:
add_header X-Cached $upstream_cache_status;
However, i'd like to see if there's a way to add a header to display the age of the cached file, as my solution has proxy_cache_valid 200 60m; set, and i'd like to check that it's respecting that setting.
So what i'm looking for would be something like:
add_header Cache-Age $upstream_cache_age;
I'm unable to find anything of the sort though, can you help?
Thanks

The nginx documentation is quite exhaustive — there's no variable with the direct relative age of the cached file.
The best way would be to use the $upstream_http_ variable class to get the absolute age of the resource by picking up its Date header through $upsteam_http_date.
add_header X-Cache-Date $upstream_http_date;
For the semantic meaning of the Date header field in HTTP/1.1, refer to rfc7231#section-7.1.1.2, which describes it as the time of the HTTP response generation, so, basically, this should accomplish exactly what you want (especially if the backend runs with the same timecounter).

I spent some time attempting to solve this with the Nginx Perl module, which does not seem to have access to $upstream_http_NAME headers that would allow you to successfully calculate the current time from a timestamp header that your proxied application created during render time.
Alternatively, you could use a different caching layer architecture like Varnish Cache, which does indeed provide the Age HTTP response header:
http://book.varnish-software.com/3.0/HTTP.html#age

I made a solution that works for this, with the Lua module, in this question: Nginx: Add “Age” header, with Lua. Is this a good solution?
I'm going to post here the code, for any suggestion it would be better to discuss it in the other link, where I explain it in more detail.
map $upstream_http_Date $mapdate {
default $upstream_http_Date;
'' 'Sat, 21 Dec 2019 00:00:00 GMT';
}
Inside location:
header_filter_by_lua_block {
ngx.header["Age"] = ngx.time() - ngx.parse_http_time(ngx.var.mapdate);
}

Related

Varnish 3: Accept JSON returns HTML

I try to make an REST-API, but varnish returns always the first called response and I have no idea why.
If I open a page with a Browser, Varnish returns HTML -> is OK.
If I curl the same page curl -i https://example.com -H "Accept: application/json" Varnish also returns HTML -> which is False.
As I see, Varnish always returns the first cached item, If this is JSON varnish returns JSON, if this is HTML Varnish returns HTML.
Without Varnish everything works like expected.
If you're serving different content type on the same URL, you you might want to tell Varnish to partition cache accordingly.
In fact, Varnish doesn't do much special about it, and it behaves like other proxies would. If they see a URL without information specifying how a resource's cache should partition, then no matter if it is a JSON or a regular request, the first request will be cached and served the same irrespective of request type.
So you need to tell Varnish how to partition cache for a resource.
The "Vary" header
The most straightforward and "HTTP" compatible way for other proxies in the wild, is Vary response header.
It tells the proxy cache (Varnish in this case), to partition, vary cache for a resource based on a header value coming from a client.
E.g. client sends header X: some-value and your app sends header Vary: X is what it takes to make the cache different between different value of X.
For Varnish 3, there is an example with Accept-Encoding.
The article details an implementation challenge with Vary - different clients may be sending quite different values for varied header thus resulting in severely partitioned cache. So you typically want to normalize the varying header's value, to a set of known, expected values.
In your case you want to Vary (and normalize) the Accept header. So something along the lines of (in vcl_recv procedure):
if (req.http.Accept) {
if (req.http.Accept ~ "application/json") {
set req.http.Accept = "application/json";
} else {
set req.http.Accept = "text/html";
}
}
Next you need to have your app actually send Vary: Accept (inside your app source files). Alternatively, you can throw some Varnish VCL instead, if modiying app source files is not feasible:
sub vcl_fetch {
if (!beresp.http.Vary) { # no Vary at all
set beresp.http.Vary = "Accept";
} elseif (beresp.http.Vary !~ "Accept") { # add to existing Vary
set beresp.http.Vary = beresp.http.Vary + ", Accept";
}
}

How do I cache bust imported modules in es6?

ES6 modules allows us to create a single point of entry like so:
// main.js
import foo from 'foo';
foo()
<script src="scripts/main.js" type="module"></script>
foo.js will be stored in the browser cache. This is desirable until I push a new version of foo.js to production.
It is common practice to add a query string param with a unique id to force the browser to fetch a new version of a js file (foo.js?cb=1234)
How can this be achieved using the es6 module pattern?
There is one solution for all of this that doesn't involve query string. let's say your module files are in /modules/. Use relative module resolution ./ or ../ when importing modules and then rewrite your paths in server side to include version number. Use something like /modules/x.x.x/ then rewrite path to /modules/. Now you can just have global version number for modules by including your first module with
<script type="module" src="/modules/1.1.2/foo.mjs"></script>
Or if you can't rewrite paths, then just put files into folder /modules/version/ during development and rename version folder to version number and update path in script tag when you publish.
HTTP headers to the rescue. Serve your files with an ETag that is the checksum of the file. S3 does that by default at example.
When you try to import the file again, the browser will request the file, this time attaching the ETag to a "if-none-match" header: the server will verify if the ETag matches the current file and send back either a 304 Not Modified, saving bandwith and time, or the new content of the file (with its new ETag).
This way if you change a single file in your project the user will not have to download the full content of every other module. It would be wise to add a short max-age header too, so that if the same module is requested twice in a short time there won't be additional requests.
If you add cache busting (e.g. appending ?x={randomNumber} through a bundler, or adding the checksum to every file name) you will force the user to download the full content of every necessary file at every new project version.
In both scenario you are going to do a request for each file anyway (the imported files on cascade will produce new requests, which at least may end in small 304 if you use etags). To avoid that you can use dynamic imports e.g if (userClickedOnSomethingAndINeedToLoadSomeMoreStuff) { import('./someModule').then('...') }
From my point of view dynamic imports could be a solution here.
Step 1)
Create a manifest file with gulp or webpack. There you have an mapping like this:
export default {
"/vendor/lib-a.mjs": "/vendor/lib-a-1234.mjs",
"/vendor/lib-b.mjs": "/vendor/lib-b-1234.mjs"
};
Step 2)
Create a file function to resolve your paths
import manifest from './manifest.js';
const busted (file) => {
return manifest[file];
};
export default busted;
Step 3)
Use dynamic import
import busted from '../busted.js';
import(busted('/vendor/lib-b.mjs'))
.then((module) => {
module.default();
});
I give it a short try in Chrome and it works. Handling relative paths is tricky part here.
I've created a Babel plugin which adds a content hash to each module name (static and dynamic imports).
import foo from './js/foo.js';
import('./bar.js').then(bar => bar());
becomes
import foo from './js/foo.abcd1234.js';
import('./bar.1234abcd.js').then(bar => bar());
You can then use Cache-control: immutable to let UAs (browsers, proxies, etc) cache these versioned URLs indefinitely. Some max-age is probably more reasonable, depending on your setup.
You can use the raw source files during development (and testing), and then transform and minify the files for production.
what i did was handle the cache busting in webserver (nginx in my instance)
instead of serving
<script src="scripts/main.js" type="module"></script>
serve it like this where 123456 is your cache busting key
<script src="scripts/123456/main.js" type="module"></script>
and include a location in nginx like
location ~ (.+)\/(?:\d+)\/(.+)\.(js|css)$ {
try_files $1/$2.min.$3 $uri;
}
requesting scripts/123456/main.js will serve scripts/main.min.js and an update to the key will result in a new file being served, this solution works well for cdns too.
Just a thought at the moment but you should be able to get Webpack to put a content hash in all the split bundles and write that hash into your import statements for you. I believe it does the second by default.
You can use an importmap for this purpose. I've tested it at least in Edge. It's just a twist on the old trick of appending a version number or hash to the querystring. import doesn't send the querystring onto the server but if you use an importmap it will.
<script type="importmap">
{
"imports": {
"/js/mylib.js": "/js/mylib.js?v=1",
"/js/myOtherLib.js": "/js/myOtherLib.js?v=1"
}
}
</script>
Then in your calling code:
import myThing from '/js/mylib.js';
import * as lib from '/js/myOtherLib.js';
You can use ETags, as pointed out by a previous answer, or alternatively use Last-Modified in relation with If-Modified-Since.
Here is a possible scenario:
The browser first loads the resource. The server responds with Last-Modified: Sat, 28 Mar 2020 18:12:45 GMT and Cache-Control: max-age=60.
If the second time the request is initiated earlier than 60 seconds after the first one, the browser serves the file from cache and doesn't make an actual request to the server.
If a request is initiated after 60 seconds, the browser will consider cached file stale and send the request with If-Modified-Since: Sat, 28 Mar 2020 18:12:45 GMT header. The server will check this value and:
If the file was modified after said date, it will issue a 200 response with the new file in the body.
If the file was not modified after the date, the server will issue a304 "not modified" status with empty body.
I ended up with this set up for Apache server:
<IfModule headers_module>
<FilesMatch "\.(js|mjs)$">
Header set Cache-Control "public, must-revalidate, max-age=3600"
Header unset ETag
</FilesMatch>
</IfModule>
You can set max-age to your liking.
We have to unset ETag. Otherwise Apache keeps responding with 200 OK every time (it's a bug). Besides, you won't need it if you use caching based on modification date.
A solution that crossed my mind but I wont use because I don't like it LOL is
window.version = `1.0.0`;
let { default: fu } = await import( `./bar.js?v=${ window.version }` );
Using the import "method" allows you to pass in a template literal string. I also added it to window so that it can be easily accessible no matter how deep I'm importing js files. The reason I don't like it though is I have to use "await" which means it has to be wrapped in an async method.
If you are using Visual Studio 2022 and TypeScript to write your code, you can follow a convention of adding a version number to your script file names, e.g. MyScript.v1.ts. When you make changes and rename the file to MyScript.v2.ts Visual Studio shows the following dialog similar to the following:
If you click Yes it will go ahead and update all the files that were importing this module to refer to MyScript.v2.ts instead of MyScript.v1.ts. The browser will notice the name change too and download the new modules as expected.
It's not a perfect solution (e.g. if you rename a heavily used module, a lot of files can end up being updated) but it is a simple one!
this work for me
let url = '/module/foo.js'
url = URL.createObjectURL(await (await fetch(url)).blob())
let foo = await import(url)
I came to the conclusion that cache-busting should not be used with ES Module.
Actually, if you have the versioning in the URL, the version is acting like a cache-busting. For instance https://unpkg.com/react#18.2.0/umd/react.production.min.js
If you don't have versioning in the URL, use the following HTTP header Cache-Control: max-age=0, no-cache to force the browser to always check if a new version of the file is available.
no-cache tells the browser to cache the file but to always perform a check
no-store tells the browser to don't cache the file. Don't use it!
Another approach: redirection
unpkg.com solved this problem with HTTP redirection.
Therefore it is not an ideal solution because it involves 2 HTTP requests instead of 1.
The first request is to get redirected to the latest version of the file (not cached, or cached for a short period of time)
The second request is to get the JS file (cached)
=> All JS files include the versioning in the URL (and have an aggressive caching strategy)
For instance https://unpkg.com/react#18.2.0/umd/react.production.min.js
=> Removing the version in the URL, will lead to a HTTP 302 redirect pointing to the latest version of the file
For instance https://unpkg.com/react/umd/react.production.min.js
Make sure the redirection is not cached by the browser, or cached for a short period of time. (unpkg allows 600 seconds of caching, but it's up to you)
About multiple HTTP requests: Yes, if you import 100 modules, your browser will do 100 requests. But with HTTP2 / HTTP3, it is not a problem anymore because all requests will be multiplexed into 1 (it is transparent for you)
About recursion:
If the module you are importing also imports other modules, you will want to check about <link rel="modulepreload"> (source Chrome dev blog).
The modulepreload spec actually allows for optionally loading not just the requested module, but all of its dependency tree as well. Browsers don't have to do this, but they can.
If you are using this technic in production, I am deeply interested to get your feedback!
Append version to all ES6 imports with PHP
I didn't want to use a bundler only because of this, so I created a small function that modifies the import statements of all the JS files in the given directory so that the version is at the end of each file import path in the form of a query parameter. It will break the cache on version change.
This is far from an ideal solution, as all JS file contents are verified by the server on each request and on each version change the client reloads every JS file that has imports instead of just the changed ones.
But it is good enough for my project right now. I thought I'd share.
$assetsPath = '/public/assets'
$version = '0.7';
$rii = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($assetsPath, FilesystemIterator::SKIP_DOTS) );
foreach ($rii as $file) {
if (pathinfo($file->getPathname())['extension'] === 'js') {
$content = file_get_contents($file->getPathname());
$originalContent = $content;
// Matches lines that have 'import ' then any string then ' from ' and single or double quote opening then
// any string (path) then '.js' and optionally numeric v GET param '?v=234' and '";' at the end with single or double quotes
preg_match_all('/import (.*?) from ("|\')(.*?)\.js(\?v=\d*)?("|\');/', $content, $matches);
// $matches array contains the following:
// Key [0] entire matching string including the search pattern
// Key [1] string after the 'import ' word
// Key [2] single or double quotes of path opening after "from" word
// Key [3] string after the opening quotes -> path without extension
// Key [4] optional '?v=1' GET param and [5] closing quotes
// Loop over import paths
foreach ($matches[3] as $key => $importPath) {
$oldFullImport = $matches[0][$key];
// Remove query params if version is null
if ($version === null) {
$newImportPath = $importPath . '.js';
} else {
$newImportPath = $importPath . '.js?v=' . $version;
}
// Old import path potentially with GET param
$existingImportPath = $importPath . '.js' . $matches[4][$key];
// Search for old import path and replace with new one
$newFullImport = str_replace($existingImportPath, $newImportPath, $oldFullImport);
// Replace in file content
$content = str_replace($oldFullImport, $newFullImport, $content);
}
// Replace file contents with modified one
if ($originalContent !== $content) {
file_put_contents($file->getPathname(), $content);
}
}
}
$version === null removes all query parameters of the imports in the given directory.
This adds between 10 and 20ms per request on my application (approx. 100 JS files when content is unchanged and 30—50ms when content changes).
Use of relative path works for me:
import foo from './foo';
or
import foo from './../modules/foo';
instead of
import foo from '/js/modules/foo';
EDIT
Since this answer is down voted, I update it. The module is not always reloaded. The first time, you have to reload the module manually and then the browser (at least Chrome) will "understand" the file is modified and then reload the file every time it is updated.

Remove the Server response header in Yesod/Warp

How can I remove the Server HTTP response header in Yesod? I found code that's responsible for setting that header, but I don't know what to do next. I know that I can replace the header value with an empty string by using addHeader "Server" "", but I'd prefer to remove it entirely.
I made an issue on GitHub Warp repository and they changed it that when the server name is empty, the "Server" header is not sent. Therefore, the solution is to set the server name to an empty string using setServerName "". In my case I had to add this to the warpSettings function in Application.hs. Note that you have to use the Warp version which contains the fix (as of May 3 '17, it has not been released yet, but you can pull it directly from GitHub).
You must call the methods inside of the function you linked. That function will "The Date and Server header is added if not exist in HTTP response header" so you need to reimplement it if you don't want that behavior.
This is why people always say to keep your code modular and your functions small; this function is too big for your use case, and there is no specific smaller function that does exactly what you want (or else it would have been called by this function!)

How to re-order HTTP headers?

I was wondering if there was any way to re-order HTTP headers that are being sent by our browser, before getting sent back to the web server?
Since the order of the headers leaves some kind of "fingerprinting", see this post and this post, I was thinking about using MITMProxy (with Inline Scripting, I guess) to modify headers on-the-fly. Is this possible?
How would one achieve that?
Note: I'm looking for a method that could be scripted, not a method using a graphical tool like the Burp Suite (although Burp is known to be able to re-order headers)
I'm open to suggestions. Perhaps NGINX might come to the rescue as well?
EDIT: I should be more specific, by giving an example...
Let's say I'm using Firefox. With the use of a funky add-on, I'm spoofing my user-agent to "look" like a Chrome browser. But then if I test my browser with ip-check.info, the "signature" of my browser remains the one of Firefox, even though my spoofed user-agent shows "Chrome".
So the solution, in this specific case, should be to re-order the HTTP headers in the same manner as Chrome does.
How can this be done?
For the record, the order of the HTTP headers should not matter at all according to RFC 7230. But now that you have asked... this can be done in mitmproxy as follows:
import random
def request(context, flow):
# flow.request.headers.fields is a tuple of (name, value) header tuples.
h = list(flow.request.headers.fields)
random.shuffle(h)
flow.request.headers.fields = tuple(h)
See the mitmproxy documentation on netlib.http.Headers for more details.
There are tons of way to reorder them as you wish:
def reorder(headers, header_order=["Host","User-Agent","Accept"]):
lines = []
for name in header_order: # add existing headers in the specified order
if name in headers:
lines.extend(headers.get_all(name))
del headers[name]
lines.extend(headers.fields) # all other headers
return lines
request.headers.fields = reorder(request.headers)

HTTUrlConnection Set Header doesnt work

I have encounter recently an interesting problem.
I am trying to access sametime by using the integrated REST API. To do that i wanted to prepare an XAgent that is doing the lookup and data connection for me.
The first two steps to connect to the Sametimeserver work perfectly fine but i have a problem with the last step. Regardless what i do i cant set the header of the GET request. I tried it with other fields then one mentioned below but it looks like its not setting the header.
Anybody any idea why setting the header in SSJS doesnt work?
var url = new java.net.URL("http://oursametimeserver/stwebapi/RTCServlet?"+sid);
conn= url.openConnection();
conn.setRequestProperty("Rtc4web-Nonce",pid);
conn.setRequestMethod("GET");
writer.write(#Implode(conn.getHeaderFields()));
Please see the results:
{null=[HTTP/1.1 400 Bad Request], Cache-Control=[no-cache="set-cookie, set-cookie2"], Expires=[Thu, 01 Dec 1994 16:00:00 GMT], X-Powered-By=[Servlet/3.0], Content-Length=[170], Content-Language=[en-US], Content-Type=[application/json], Connection=[Close], Date=[Mon, 09 Mar 2015 19:18:54 GMT], Set-Cookie=[JSESSIONID=0000zwXn8VhNWlZ78jN4yfMJQrU:-1; Path=/; HttpOnly]}
Please ignore the Error 400. The rest api returns it because i am not submitting the RTC4WEB-NONCE field in the header. I get the same result when i use POSTMAN in chrome. With that value everything is fine.
You need to change your approach slightly:
1) Write a small Java class that wraps all the call to Java objects, so you can call that one with a simple JS call. It takes the "map a untyped js variable to a typed Java method" guesswork out of the picture
2) Don't use the HttpUrlConnection class. Either use the ApacheHttp Client which is both available and has methods to set the header - or use the social business toolkit that has ready functions to connect to Sametime

Resources