Does the Linux filesystem cache files efficiently? - linux

I'm creating a web application running on a Linux server. The application is constantly accessing a 250K file - it loads it in memory, reads it and sends back some info to the user. Since this file is read all the time, my client is suggesting to use something like memcache to cache it to memory, presumably because it will make read operations faster.
However, I'm thinking that the Linux filesystem is probably already caching the file in memory since it's accessed frequently. Is that right? In your opinion, would memcache provide a real improvement? Or is it going to do the same thing that Linux is already doing?
I'm not really familiar with neither Linux nor memcache, so I would really appreciate if someone could clarify this.

Yes, if you do not modify the file each time you open it.
Linux will hold the file's information in copy-on-write pages in memory, and "loading" the file into memory should be very fast (page table swap at worst).
Edit: Though, as cdhowie points out, there is no 'linux filesystem'. However, I believe the relevant code is in linux's memory management, and is therefore independent of the filesystem in question. If you're curious, you can read in the linux source about handling vm_area_struct objects in linux/mm/mmap.c, mainly.

As people have mentioned, mmap is a good solution here.
But, one 250k file is very small. You might want to read it in and put it in some sort of memory structure that matches what you want to send back to the user on startup. Ie, if it is a text file an array of lines might be a good choice, etc.

The file should be cached, but make sure the noatime option is set on the mount, otherwise the access time will attempt to be saved to the file, invalidating the cache.

Yes, definitely. It will keep accessed files in memory indefinitely, unless something else needs the memory.
You can control this behaviour (to some extent) with the fadvise system call. See its "man" page for more details.
A read/write system call will still normally need to copy the data, so if you see a real bottleneck doing this, consider using mmap() which can avoid the copy, by mapping the cache pages directly into the process.

I guess putting that file into ramdisk (tmpfs) may make enough advantage without big modifications. Unless you are really serious about response time in microseconds unit.

Related

Telling Linux not to keep a file in the cache when it is written to disk

I am writing a large file to disk from a user-mode application. In parallel to it, I am writing one or more smaller files. The large file won't be read back anytime soon, but the small files could be. I have enough RAM for the application + smaller files, but not enough for the large file. Can I tell the OS not to keep parts of the large file in cache after they are written to disk so that more cache is available for smaller files? I still want writes to the large file be fast enough.
Can I tell the OS not to keep parts of the large file in cache ?
Yes, you probably want to use some system call like posix_fadvise(2) or madvise(2). In weird cases, you might use readahead(2) or userfaultfd(2) or Linux-specific flags to mmap(2). Or very cleverly handle SIGSEGV (see signal(7), signal-safety(7) and eventfd(2) and signalfd(2)) You'll need to write your C program doing that.
But I am not sure that it is worth your development efforts. In many cases, the behavior of a recent Linux kernel is good enough.
See also proc(5) and linuxatemyram.com
You many want to read the GC handbook. It is relevant to your concerns
Conbsider studying for inspiration the source code of existing open-source software such as GCC, Qt, RefPerSys, PostGreSQL, GNU Bash, etc...
Most of the time, it is simply not worth the effort to explicitly code something to manage your page cache.
I guess that mount(2) options in your /etc/fstab file (see fstab(5)...) are in practice more important. Or changing or tuning your file system (e.g. ext4(5), xfs(5)..). Or read(2)-ing in large pieces (1Mbytes).
Play with dd(1) to measure. See also time(7)
Most applications are not disk-bound, and for those who are disk bound, renting more disk space is cheaper that adding and debugging extra code.
don't forget to benchmark, e.g. using strace(1) and time(1)
PS. Don't forget your developer costs. They often are a lot above the price of a RAM module (or of some faster SSD disk).

Linux: How to prevent a file backed memory mapping from causing access errors (SIGBUS etc.)?

I want to write a wrapper for memory mapped file io, that either fails to map a file or returns a mapping that is valid until it is unmapped. With plain mmap, problems arise, if the underlying file is truncated or deleted while being mapped, for example. According to the linux man page of mmap SIGBUS is received if memory beyond the new end of the file is accessed after a truncate. It is no option to catch this signal and handle the error this way.
My idea was to create a copy of the file and map the copy. On a cow capable file system, this would impose little overhead.
But the problem is: how do I protect the copy from being manipulated by another process? A tempfile is no real option, because in theory a malicious process could still mutate it. I know that there are file locks on Linux, but as far as I understood they're either optional or don't prevent others from deleting the file.
I'm asking for two kinds of answers: Either a way to mmap a file in a rock solid way or a mechanism to protect a tempfile fully from other processes. But maybe my whole way of approaching the problem is wrong, so feel free to suggest radical solutions ;)
You can't prevent a skilled and determined user from intentionally shooting themselves in the foot. Just take reasonable precautions so it doesn't happen accidentally.
Most programs assume the input file won't change and that's usually fine
Programs that want to process files shared with cooperative programs use file locking
Programs that want a private file will create a temp file, snapshot or otherwise -- and if they unlink it for auto-cleanup, it's also inaccessible via the fs
Programs that want to protect their data from all regular user actions will run as a dedicated system account, in which case chmod is protection enough.
Anyone with access to the same account (or root) can interfere with the program with a simple kill -BUS, chmod/truncate, or any of the fancier foot-guns like copying and patching the binary, cloning its FDs, or attaching a debugger. If that's what they want to do, it's not your place to stop them.

Can mmap and O_DIRECT be used together?

As I understand it, when you mmap a file you are basically mapping the pages from the page cache for that file directly into your process, and when you use O_DIRECT you are bypassing the page cache. Does it ever make sense to use the two together? If my understanding is right how would it even work? mmap seems to rely on the file being in the page cache and O_DIRECT seems to prevent it from going there (assuming nothing else on the system has the file open). I found this question but the answerer seems to think it's perfectly normal to do.
I think it would not have a lot of sense.
O_DIRECT means that all I/O should be reflected in storage, as soon as possible (without cache).
The mapped pages is a copy of storage (file) in the memory. Reflecting every read from and write to memory would have to do some I/O and this would be huge performance hit.

Is there a way to show linux buffer cache misses?

I am trying to measure the effects of adding memory to a LAMP server.
How can I find which processes try to read from the Linux buffer cache, but miss and read from disk instead?
SystemTap is one of the best ways to do this, but fair warning it's difficult to get a great answer. The kernel simply doesn't provide this data directly. You have to infer it based on how many times the system requested a read and how many times a disk was read from. Usually they line up fairly well and you can attribute the difference to the VFS cache, but not always. One problem is LVM- LVM is a "block device", but so is the underlying disk(s), so if you're not careful it's easy to double-count the disk reads.
A while back I took a stab at it and wrote this:
https://sourceware.org/systemtap/wiki/WSCacheHitRate
I do not claim that it is perfect, but it works better than nothing, and usually generates reasonable output as long as the environment is fairly "normal". It does attempt to account for LVM in a fairly crude way.

Can running 'cat' speed up subsequent file random access on a linux box?

on a linux box with plenty of memory (a few Gigs), I need to access randomly to a big file as fast as possible.
I was thinking about doing a cat myfile > /dev/null before accessing it so my file pages go in memory sequentially, hence faster than with a dry random access.
Does this approach make sense to you?
While doing that may force the contents of the file into the system's cache, you are better off using posix_fadvise() (with the POSIX_FADV_WILLNEED advice) or the (blocking)readahead() call to make the kernel precache the data you will need.
EDIT:
You might also want to try using the POSIX_FADV_RANDOM advice to disable readahead altogether.
There's an article with a decent explanation of usage here: Advising the Linux Kernel on File I/O
As the others said, you'll need to benchmark it in your particular case.
It is quite possible it will result in a significant performance increase though.
On traditional rotating media (i.e. a hard disk) sequential access (cat file > /dev/null/fadvise) is much faster than random access.
Only one way to be sure that any (possibly premature?) optimization is worthwhile: benchmark it.
It could theoretically speed up the access (especially if you access almost everything from the file), but I wouldn't bet on a big difference.
The only really useful approach is to benchmark it for your specific case.
If you really want the speed I'd recommend trying memory-mapped IO instead of trying to hack something up with cat. Of course, it depends on the size of file you're trying to access and the type of access you want.. this may not be possible...
readahead is a good call too...
Doing "cat" on a big file might bring the data in and blow more valuable data out of the cache; this is not what you want.
If performance is at all important to you, you'll be doing regular performance testing anyway (and soak tests etc), so continue to do that and watch your graphs, figures etc.

Resources