What happens in linux when opening a file? - linux

When using the open() function in linux in order to open a file, is it true that the OS brings all the file blocks to the cache?

AFAIK, the kernel won't systematically on open(2) bring all file blocks into its page cache (in particular, that cannot work for files bigger than available RAM).
But it may bring some of them. I guess that for most (small) files, perhaps all blocks could be read. But I could be wrong, and it is highly system specific (and configuration specific too).
See also O_DIRECT flag to open(2) & posix_fadvise(2) and options for mount(8)

Related

Telling Linux not to keep a file in the cache when it is written to disk

I am writing a large file to disk from a user-mode application. In parallel to it, I am writing one or more smaller files. The large file won't be read back anytime soon, but the small files could be. I have enough RAM for the application + smaller files, but not enough for the large file. Can I tell the OS not to keep parts of the large file in cache after they are written to disk so that more cache is available for smaller files? I still want writes to the large file be fast enough.
Can I tell the OS not to keep parts of the large file in cache ?
Yes, you probably want to use some system call like posix_fadvise(2) or madvise(2). In weird cases, you might use readahead(2) or userfaultfd(2) or Linux-specific flags to mmap(2). Or very cleverly handle SIGSEGV (see signal(7), signal-safety(7) and eventfd(2) and signalfd(2)) You'll need to write your C program doing that.
But I am not sure that it is worth your development efforts. In many cases, the behavior of a recent Linux kernel is good enough.
See also proc(5) and linuxatemyram.com
You many want to read the GC handbook. It is relevant to your concerns
Conbsider studying for inspiration the source code of existing open-source software such as GCC, Qt, RefPerSys, PostGreSQL, GNU Bash, etc...
Most of the time, it is simply not worth the effort to explicitly code something to manage your page cache.
I guess that mount(2) options in your /etc/fstab file (see fstab(5)...) are in practice more important. Or changing or tuning your file system (e.g. ext4(5), xfs(5)..). Or read(2)-ing in large pieces (1Mbytes).
Play with dd(1) to measure. See also time(7)
Most applications are not disk-bound, and for those who are disk bound, renting more disk space is cheaper that adding and debugging extra code.
don't forget to benchmark, e.g. using strace(1) and time(1)
PS. Don't forget your developer costs. They often are a lot above the price of a RAM module (or of some faster SSD disk).

Is there a simple way to fork a file descriptor?

I've just read a handful of man pages: dup, dup2, fcntl, pread/pwrite, mmap, etc.
Currently I am using mmap, but it's not the nicest thing in the world because I have to manage file offset and buffer length myself and basically reimplement read/write in userspace.
From what I gathered:
dup, dup2, fcntl just create aliases for the fds, so their offsets and flags are shared - reading from one advances the offset of the others.
pread/pwrite can be buggy and give inconsistent results.
mmap is buggy on linux when given some uncommon flags, but I don't need them.
Am I missing something or is mmap really the way to go?
(Note that re-open()ing a file is dangerous on POSIX - unlike Windows, POSIX provides no guarantees on the path not being moved/deleted while the file is open. On POSIX, you can open a path, move the file, and still read from it. You can even delete the file sometimes. I also couldn't find anything that can open an inode.)
I'd like answers for at least the most common POSIX variants, if there's no one answer for them all.
On Linux, opening /proc/self/fd/$NUM will work regardless of whether the file still has the same name it had the first time you opened it, and will generate a new open file description (i.e. a new fd with independent offset and flags).
I don't know of any POSIXly portable way of doing this.
(I also don't know what you mean about pread/pwrite being buggy...)

Mandatory file lock on linux

On Linux I can dd a file on my hard drive and delete it in Nautilus while the dd is still going on.
Can Linux enforce a mandatory file lock to protect R/W?
[EDIT] The original question wasn't about linux file locking capabilities but about a supposed bug in linux, reproducing it here as it is responded below and others may have the same question.
People keep telling me Linux/Unix is better OS. I am coding Java on Linux now and come across a problem, that I can easily reproduce: I can dd a file on my hard drive and delete it in Nautilus while the dd is still going on. How come linux cannot enforce a mandatory file lock to protect R/W??
To do mandatory locking on Linux, the filesystem must be mounted with the -o mand option, and you must set g-x,g+s permissions on the file. That is, you must disable group execute, and enable setgid. Once this is performed, all access will either block or error with EAGAIN based on the value of O_NONBLOCK on the file descriptor. But beware: "The implementation of mandatory locking in all known versions of Linux is subject to race conditions which render it unreliable... It is therefore inadvisable to rely on mandatory locking." See fcntl(2).
You don't need locking. This is not a bug but a choice, your assumptions are wrong.
The file system uses reference counting and it will mark a file as free only when all hard links to the file are removed and all file descriptors are closed.
This approach allows safe file system operations that Windows, for example, doesn't. Operations like delete, move and rename over files in use without needing locking or breaking anything.
Your dd operation is going to succeed despite the file removal, which will actually be deferred till the dd finishes.
http://en.wikipedia.org/wiki/Reference_counting#Disk_operating_systems
[EDIT] My response doesn't make much sense now as the question was edited by someone else. The original question was about a supposed bug in linux and not about if linux can lock a file:
People keep telling me Linux/Unix is better OS. I am coding Java on Linux now and come across a problem, that I can easily reproduce: I can dd a file on my hard drive and delete it in Nautilus while the dd is still going on. How come linux cannot enforce a mandatory file lock to protect R/W??
Linux and Unix OS's can enforce file locks, but it does not do so by default becuase of its multiuser design. Try reading the manual pages for flock and fcntl. That might get you started.

Does the Linux filesystem cache files efficiently?

I'm creating a web application running on a Linux server. The application is constantly accessing a 250K file - it loads it in memory, reads it and sends back some info to the user. Since this file is read all the time, my client is suggesting to use something like memcache to cache it to memory, presumably because it will make read operations faster.
However, I'm thinking that the Linux filesystem is probably already caching the file in memory since it's accessed frequently. Is that right? In your opinion, would memcache provide a real improvement? Or is it going to do the same thing that Linux is already doing?
I'm not really familiar with neither Linux nor memcache, so I would really appreciate if someone could clarify this.
Yes, if you do not modify the file each time you open it.
Linux will hold the file's information in copy-on-write pages in memory, and "loading" the file into memory should be very fast (page table swap at worst).
Edit: Though, as cdhowie points out, there is no 'linux filesystem'. However, I believe the relevant code is in linux's memory management, and is therefore independent of the filesystem in question. If you're curious, you can read in the linux source about handling vm_area_struct objects in linux/mm/mmap.c, mainly.
As people have mentioned, mmap is a good solution here.
But, one 250k file is very small. You might want to read it in and put it in some sort of memory structure that matches what you want to send back to the user on startup. Ie, if it is a text file an array of lines might be a good choice, etc.
The file should be cached, but make sure the noatime option is set on the mount, otherwise the access time will attempt to be saved to the file, invalidating the cache.
Yes, definitely. It will keep accessed files in memory indefinitely, unless something else needs the memory.
You can control this behaviour (to some extent) with the fadvise system call. See its "man" page for more details.
A read/write system call will still normally need to copy the data, so if you see a real bottleneck doing this, consider using mmap() which can avoid the copy, by mapping the cache pages directly into the process.
I guess putting that file into ramdisk (tmpfs) may make enough advantage without big modifications. Unless you are really serious about response time in microseconds unit.

Is the file mutex in Linux? How to implement it?

In windows, if I open a file with MS Word, then try to delete it.
The system will stop me. It prevents the file being deleted.
There is a similar mechanism in Linux?
How can I implement it when writing my own program?
There is not a similar mechanism in Linux. I, in fact, find that feature of windows to be an incredible misfeature and a big problem.
It is not typical for a program to hold a file open that it is working on anyway unless the program is a database and updating the file as it works. Programs usually just open the file, write contents and close it when you save your document.
vim's .swp file is updated as vim works, and vim holds it open the whole time, so even if you delete it, the file doesn't really go away. vim will just lose its recovery ability if you delete the .swp file while it's running.
In Linux, if you delete a file while a process has it open, the system keeps it in existence until all references to it are gone. The name in the filesystem that refers to the file will be gone. But the file itself is still there on disk.
If the system crashes while the file is still open it will be cleaned up and removed from the disk when the system comes back up.
The reason this is such a problem in Windows is that mandatory locking frequently prevents operations that should succeed from succeeding. For example, a backup process should be able to read a file that is being written to. It shouldn't have to stop the process that is doing the writing before the backup proceeds. In many other cases, operations that should be able to move forward are blocked for silly reasons.
The semantics of most Unix filesystems (such as Linux's ext2 fs family) is that a file can be unlink(2)'d at any time, even if it is open. However, after such a call, if the file has been opened by some other process, they can continue to read and write to the file through the open file descriptor. The filesystem does not actually free the storage until all open file descriptors have been closed. These are very long-standing semantics.
You may wish to read more about file locking in Unix and Linux (e.g., the Wikipedia article on File Locking.) Basically, mandatory and advisory locks on Linux exist but they're not guaranteed to prevent what you want to prevent.

Resources