Are there any known web browser implementations (or browser options) that explicitly do not cache pages (and other data) to disk?
I'm looking for an implementation that only caches to memory (obviously the cache is discarded when the process exits). I'm ignoring external disk write factors such as OS paging.
it's possible to disable the cache in every major browser, wich solves your first problem.
if you want to cache to memory, you have to turn caching on again, configure a ramdisk and then set the caching-path to that ram-disk. in firefox, for example, you have to change Browser.cache.disk.parent_directory in about:config to achive this - and i'm pretty sure other major browsers will have similar options.
Related
Linux exploits remaining memory for its file cache
In my application (written in C++), I'd like to flush the dirty pages to disks explicitly from time to time
(Using O_DIRECT is not appropriate for me)
I tried fflush(), but it seems not what I wanted
Is there any way to flush the dirty pages of OS file cache to disks?
Thanks
You can use sync_file_range() to encourage flushing on Linux but confusingly you can't use sync_file_range() to guarantee file durability/data integrity - it is simply a hint that might help get flushing underway (see this Linux Plumbers Conference 2019 video of Postgres developer Andres Freund complaining about the sync_file_range()'s manpage and the reply from filesystem developer Jan Kara). In short: it can help trigger flushing but you'll need to add/use something else to know durability.
I believe all the usual file descriptor sync style calls (fsync(), fdatasync() etc.) also hint that you want writeback to start occurring but in a more heavy handed fashion compared to sync_file_range() (because they also force flushing of device non-volatile caches too)...
I'm curious if there's any advantages in loading my website in to a huge global object (containing file content, file names and so on..) at startup.
Is there a speed advantage (considering such a massive Object)?
Is there a maximum size of a string or an object?
Do the files need to be encoded?
How will this affect my server RAM?
I'm aware that all files will be cached and I will need to reload parts of the object whenever a file is edited.
1) Yes there is a obvious benefit: Reading from RAM is faster than reading from disk (http://norvig.com/21-days.html#answers)
2) Every time you read a file from the filesystem with Node, you get back a Buffer object. Buffer objects are stored outside of the JS heap so you're not limited by the total v8 heap size. However each Buffer has a limit of 1Gb in size (this is changing: https://twitter.com/trevnorris/status/603345087028793345). Obvious the total limit is the limit of your process (see ulimit) and of your system in total.
3) That's up to you. If you just read the files as Buffers, you don't need to specify encoding. It's just raw memory
Other thoughts:
You should be aware that file caching is already happening in the Kernel by ways of the page cache. Every time you read a file from the filesystem, you're not necessarily incurring a disk seek/read.
You should benchmark your idea vs just reading from the filesystem and see what the gains are. If you're saving 10ms but it still takes > 150ms for a user to retrieve the web page over the network, it's probably a waste of time.
It's going to be a lot of programming work to load all of your static assets onto some sort of in-memory object and then to serve them from node. I don't know of any web frameworks that have built in facilities for this and you're probably going to poorly reinvent a whole bunch of wheels... So no; there's absolutely no advantage in you doing this.
Web servers like apache handle caching files really well, if set up to do so. You can use one as a proxy for node. They also access the file system much more quickly than node does. Using a proxy essentially implements most of the in-memory solution you're interested in.
Proper use of expiration headers will ensure that clients won't request unchanging assets unnecessarily. You can also use a content delivery network, like akamai, to serve static assets from servers closer to your users. Both of these approaches mean that clients never even hit your server, though a CDN will cost you.
Serving files isn't terribly expensive as compared to sending them down the wire or doing things like querying a database.
Use a web servers to proxy your static content. Then make sure client side caching policies are set up correctly. Finally, consider a content delivery network. Don't re-invent the wheel!
AFAIK all disk reads on linux get into the page cache.
Is there a way to prevent reads (done by a backup process) to get in to the page cache?
Imagine:
A server runs fine, since most operations don't need to touch the disk, since enough memory is available.
Now the backup process starts and does a lot of reading. The read bytes get into the memory (page cache) although nobody wants to read the same bytes again in the next hours.
The backup data fills up the memory and more important pages from the cache get dropped.
Server performance gets worse since more operations need to touch the disk, because the relevant pages were dropped from the cache.
My preferred solution:
Tell linux that the reads done by the backup process don't need to be stored in the page cache.
if you re using rsync there is the flag --drop-cache according to this question
the nocache utility which
minimize the effect an application has on the Linux file system cache
Use case: backup processes that should not interfere with the present state of the cache.
using dd there is direct I/O to bybass cache according to this question
the dd also has the option nocache option check the command info coreutils 'dd invocation'for details
The kernelmode static content cache is part of HTTP.SYS and is rather straightforward to configure. When debugging, you can always inspect the contents of the kernelmode cache with the command:
netsh http show cachestate
But the usermode cache in IIS is much harder to debug. There seems to be no inspection possibilities at all. Is there a simple way to see what files are in the usermode cache inside the web applications worker process?
In our project we have an ISAPI module that does introspection into the requested file before returning them to the browser. Because of this we cannot use the standard static content cache in IIS. We are having problems with caching, where IIS stops adding new items to the cache after a short warmup period. Items get flushed, but no new files seems to get cached. According to perfmon the number of items in the cache falls down to a handful. It would be very valuable for us to be able to see exactly what files are in the cache at any given time.
I currently use Berkeley DBs fronted by a Java server for a high-performance disk-backed cache. Provided you warm it up before allowing it to face live traffic, your update rate is low, and your working set fits in memory, the Linux buffer cache does an excellent job. It's measurably faster than memcache, in part because you don't need to context switch to the memcached and back on read. We're very happy with the performance.
We're going to be adding some data to the cache that we're not comfortable leaving on disk in plain text. We've measured and are unhappy with the performance of decrypting during request processing, so we're looking for solutions that decrypt only when the data is loaded from disk and then keep it available in memory.
Before building something that does this, I wanted to find out if we can simply slide in an encrypted filesystem and continue to rely on the OS to manage the cache for us. I haven't found any documentation that tells me at what layer the decryption is done.
So my question is: Can anyone tell me, for any particular Linux encrypted FS, whether the (en|de)cryption is done below the buffer cache (and therefore the cache contains plaintext) or above (and the cache contains ciphertext)?
The buffer cache sits below the actual filesystem, so it will cache encrypted data. See the diagram at IBM's Anatomy of a Filesystem. Since you want to cache unencrypted data, so long as your encrypted filesystem was created using the 'loop' device the buffer cache will also contain an unencrypted copy of your data, and so it should be fast (at the cost of more memory for FS buffers in-use).
I haven't played with this, but am pretty sure that buffer cache and VM are not aware of the encryption, so you should see comparable performance with your usage.