Linux: How to prevent a file backed memory mapping from causing access errors (SIGBUS etc.)?

Linux: How to prevent a file backed memory mapping from causing access errors (SIGBUS etc.)? - linux

I want to write a wrapper for memory mapped file io, that either fails to map a file or returns a mapping that is valid until it is unmapped. With plain mmap, problems arise, if the underlying file is truncated or deleted while being mapped, for example. According to the linux man page of mmap SIGBUS is received if memory beyond the new end of the file is accessed after a truncate. It is no option to catch this signal and handle the error this way.
My idea was to create a copy of the file and map the copy. On a cow capable file system, this would impose little overhead.
But the problem is: how do I protect the copy from being manipulated by another process? A tempfile is no real option, because in theory a malicious process could still mutate it. I know that there are file locks on Linux, but as far as I understood they're either optional or don't prevent others from deleting the file.
I'm asking for two kinds of answers: Either a way to mmap a file in a rock solid way or a mechanism to protect a tempfile fully from other processes. But maybe my whole way of approaching the problem is wrong, so feel free to suggest radical solutions ;)

You can't prevent a skilled and determined user from intentionally shooting themselves in the foot. Just take reasonable precautions so it doesn't happen accidentally.
Most programs assume the input file won't change and that's usually fine
Programs that want to process files shared with cooperative programs use file locking
Programs that want a private file will create a temp file, snapshot or otherwise -- and if they unlink it for auto-cleanup, it's also inaccessible via the fs
Programs that want to protect their data from all regular user actions will run as a dedicated system account, in which case chmod is protection enough.
Anyone with access to the same account (or root) can interfere with the program with a simple kill -BUS, chmod/truncate, or any of the fancier foot-guns like copying and patching the binary, cloning its FDs, or attaching a debugger. If that's what they want to do, it's not your place to stop them.

Related

How to safely use mmap() for reading?

I have a need to do a lot of random-access reads in a large file so I use mmap(). This solution seems to be perfect as long as the mapped file is untouched. But this is not always the case. If the mapped file is tampered with, several problems arise:
If a change to a file reduces its length or the file becomes inaccessible then a process in order to live must handle SIGBUS signal (at least, in Linux implementation). This adds additional complications since I'm writing a library.
To make things even worse, mmap() manpage says it is unspecified if changes to the original
file are propagated to the memory. So they can very well be propagated.
This essentially means the contents of the file I work with can become white noise at any moment.
Does all of this mean that any program that maps a freely accessible file and does not handle these problems can be brought down by a DoS attack? Even while I do not expect evil hackers to go after my program, I can easily see a user modifying my mapped file, replacing it with another one or making the file inaccessible by, for example, removing a USB drive. And while I can write a signal handler (and this is a bit messy, so I am looking for a better solution) to solve the first problem,
I have no idea how to solve the second one.
The file can not be copied and can be freely moved around if it's not used by a program (just like any other media file). Linux file locks do not always work.
So, how to safely use mmap() for reading?

Linux kernel : logging to a specific file

I am trying to edit the linux kernel. I want some information to be written out to a file as a part of the debugging process. I have read about the printk function. But i would like to add text to a particular file (file other from the default files that keep debug logs).
To cut it short: I would kind of like to specify the "destination" in the printk function (or at least some work-around it)
How can I achieve this? Will using fwrite/fopen work (if yes, will it work without causing much overhead compared to printk, since they are implemented differently)?
What other options do i have?

Using fopen and fwrite will certainly not work. Working with files in kernel space is generally a bad idea.
It all really depends on what you are doing in the kernel though. In some configurations, there may not even be a hard disk for you to write to. If however, you are working at a stage where you can have certain assumptions about the running kernel, you probably actually want to write a kernel module rather than edit the kernel itself. For all you care, a kernel module is just as good as any other part of the kernel, but they are inserted when the kernel is already up and running.
You may also be thinking of doing so for debugging, or have output of a kernel-level application (e.g. an application that you are forced to run at kernel level for real-time constraints etc). In that case, kio may be of interest to you, but if you want to use it, do make sure you understand why.
kio is a library I wrote just for those "kernel-level applications", which makes a kernel module see a /proc file as if it's a user of it (rather than a provider). To make it work, you should have a user-space application also opening that virtual file and redirect it to wherever you want to write your log. Something along the lines of opening the file with kopen in write mode and in user space tell cat /proc/your_file > ~/log_file.
Note: I still recommend printk unless you really know what you are doing. Since you are thinking of fopen in kernel space, I don't think you really know what you are doing.

using files as IPC on linux

I have one writer which creates and sometimes updates a file with some status information. The readers are implemented in lua (so I got only io.open) and possibly bash (cat, grep, whatever). I am worried about what would happen if the status information is updated (which means a complete file rewrite) while a reader has an open handle to the file: what can happen? I have also read that if the write/read operation is below 4KB, it is atomic: that would be perfectly fine for me, as the status info can fit well in such dimension. Can I make this assumption?

A read or write is atomic under 4Kbytes only for pipes, not for disk files (for which the atomic granularity may be the file system block size, usually 512 bytes).
In practice you could avoid bothering about such issues (assuming your status file is e.g. less than 512 bytes), and I believe that if the writer is opening and writing quickly that file (in particular, if you avoid open(2)-ing a file and keeping the opened file handle for a long time -many seconds-, then write(2)-ing later -once, a small string- inside it), you don't need to bother.
If you are paranoid, but do assume that readers are (like grep) opening a file and reading it quickly, you could write to a temporary file and rename(2)-ing it when written (and close(2)-ed) in totality.
As Duck suggested, locking the file in both readers and writers is also a solution.

I may be mistaken, in which case someone will correct me, but I don't think the external readers are going to pay any attention to whether the file is being simultaneously updated. They are are going to print (or possibly eof or error out) whatever is there.
In any case, why not avoid the whole mess and just use file locks. Have the writer flock (or similar) and the readers check the lock. If they get the lock they know they are ok to read.

store some data in the struct inode

Hello I am a newbie to kernel programming. I am writing a small kernel module
that is based on wrapfs template to implement a backup mechanism. This is
purely for learning basis.
I am extending wrapfs so that when a write call is made wrapfs transparently
makes a copy of that file in a separate directory and then write is performed
on the file. But I don't want that I create a copy for every write call.
A naive approach could be I check for existence of file in that directory. But
I think for each call checking this could be a severe penalty.
I could also check for first write call and then store a value for that
specific file using private_data attribute. But that would not be stored on
disk. So I would need to check that again.
I was also thinking of making use of modification time. I could save a
modification time. If the older modification time is before that time then only
a copy is created otherwise I won't do anything. I tried to use inode.i_mtime
for this but it was the modified time even before write was called, also
applications can modify that time.
So I was thinking of storing some value in inode on disk that indicates its
backup has been created or not. Is that possible? Any other suggestions or
approaches are welcome.

You are essentially saying you want to do a Copy-On-Write virtual filesystem layer.
IMO, some of these have been done, and it would be easier to implement these in userland (using libfuse and the fuse module, e.g.). That way, you can be king of your castle and add your metadata in any which way you feel is appriate:
just add (hidden) metadata files to each directory
use extended POSIX attributes (setfattr and friends)
heck, you could even use a sqlite database
If you really insist on doing these things in-kernel, you'll have a lot more work since accessing the metadata from kernel mode is goind to take a lot more effort (you'd most likely want to emulate your own database using memory mapped files so as to minimize the amount of 'userland (style)' work required and to make it relatively easy to get atomicity and reliability right1.
1
On How Everybody Gets File IO Wrong: see also here

You can use atime instead of mtime. In that case setting S_NOATIME flag on the inode prevents it from updating (see touch_atime() function at the inode.c). The only thing you'll need is to mount your filesystem with noatime option.

Does the Linux filesystem cache files efficiently?

I'm creating a web application running on a Linux server. The application is constantly accessing a 250K file - it loads it in memory, reads it and sends back some info to the user. Since this file is read all the time, my client is suggesting to use something like memcache to cache it to memory, presumably because it will make read operations faster.
However, I'm thinking that the Linux filesystem is probably already caching the file in memory since it's accessed frequently. Is that right? In your opinion, would memcache provide a real improvement? Or is it going to do the same thing that Linux is already doing?
I'm not really familiar with neither Linux nor memcache, so I would really appreciate if someone could clarify this.

Yes, if you do not modify the file each time you open it.
Linux will hold the file's information in copy-on-write pages in memory, and "loading" the file into memory should be very fast (page table swap at worst).
Edit: Though, as cdhowie points out, there is no 'linux filesystem'. However, I believe the relevant code is in linux's memory management, and is therefore independent of the filesystem in question. If you're curious, you can read in the linux source about handling vm_area_struct objects in linux/mm/mmap.c, mainly.

As people have mentioned, mmap is a good solution here.
But, one 250k file is very small. You might want to read it in and put it in some sort of memory structure that matches what you want to send back to the user on startup. Ie, if it is a text file an array of lines might be a good choice, etc.

The file should be cached, but make sure the noatime option is set on the mount, otherwise the access time will attempt to be saved to the file, invalidating the cache.

Yes, definitely. It will keep accessed files in memory indefinitely, unless something else needs the memory.
You can control this behaviour (to some extent) with the fadvise system call. See its "man" page for more details.
A read/write system call will still normally need to copy the data, so if you see a real bottleneck doing this, consider using mmap() which can avoid the copy, by mapping the cache pages directly into the process.

I guess putting that file into ramdisk (tmpfs) may make enough advantage without big modifications. Unless you are really serious about response time in microseconds unit.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string