What's the relationship between `struct file` and file descriptor? - linux

I know that a struct file is a kernel representation for an opened file and a file descriptor is for user space to use. But I still have some questions:
Is struct file and fd many-to-one? Can multiple file descriptors in the same process share one struct file?
If the answer is yes, can multiple file descriptors in different processes share the same struct file?
If the answer is yes, how does the kernel keep track of the fd-specific information such as the current file offset?

Related

Resolving file descriptor to file name / file path

I am currently developing a simple kernel module that can steal system calls such as open, read, write and replace them with a simple function which logs the files being opened, read, written, into a file and return the original system calls.
My query is, I am able to get the File Descriptor in read and write system calls, but I am not able to understand how to obtain file name using the same.
Currently I am able to access the file structure associated with given FD using following code:
struct file *file;
file = fcheck(fd);
This file structure has two important entities in it, which are of my concern I believe:
f_path
f_inode
Can anybody help me get dentry or inode or the path name associated with this fd using the file structure associated with it?
Is my approach correct? Or do I need to do something different?
I am using Ubuntu 14.04 and my kernel version is 3.19.0-25-generic, for the kernel module development.
.f_inode is actually an inode.
.f_path->dentry is a dentry.
Traversing this dentry via ->d_parent link, until f_path.mnt.mnt_root dentry will be touched, and collecting dentry->d_name components, will construct the file's path, relative to the mount point. This is done, e.g., with d_path, but in more carefull way.
Instead of fcheck(fd), which should be used inside RCU read section, you can also use fget(fd), which should be paired with fput().
The approach is completely incorrect - see http://www.watson.org/~robert/2007woot/
Linux already has a reliable mechanism for doing this thing (audit). If you want to implement it anyway (for fun I presume), you want to place your hooks roughly where audit is doing that. Chances are LSM hooks are in appropriate places, have not checked.

Linux Implement open file descriptors C

1) Is it any alternative to looping through /proc in order to get the total number of open file descriptors?
I used the following dirs:
/proc/PID/fd/*
/proc/PID/maps
/proc/PID/cwd
/proc/PID/root
/proc/PID/exe
2) The number is different from lsof | wc -l and cat /proc/sys/fs/file-nr
3) Loaded dynamically linked libraries and current working directories can be counted as open file descriptors?
Implementation all open file descriptors in C for Linux
How you count this depends on what information you are interested in.
Looking through /proc/PID/fd/* will give you the number of open file descriptors. However, one caveat is that two processes may actually share a file descriptor, if you fork then the child process inherits the file descriptor from its parent, and this method will then count it twice, once for each process.
/proc/PID/maps will show you the memory map of the process, which can include the loaded executable itself and dynamically linked libraries, but also includes things that don't correspond to files like the heap, the stack, the vdso section which is a virtual shared object exported by the kernel, and so on.
lsof will list a variety of ways that files can be in use, which includes more than just file descriptors; it also includes the executable and shared libraries, but does not include the memory regions that don't correspond to files that show up in /proc/PID/maps like the stack, heap, vdso section, etc.
/proc/sys/fs/file-nr will report the number of open kernel file handles. A kernel file handle is different than a file descriptor; there can be more than one file descriptor open that point to the same file handle, for instance, by calling dup or dup2.
These differences explain why you're getting different numbers from these different ways of counting. The question is, what purpose are you using this count for? That will help answer which way of counting you should actually use.
1) no, but it seems you are confused as to what constitutes an open file descriptor, as suggested by your second question
2) see http://codingtragedy.blogspot.com/2015/04/nofile-ulimit-n-rlimitnofile-most.html - while it explains handling of a resource limits which may seem irrelevant, it also explains the difference between a file descriptor and a 'struct file' which you most likely want, and it even covers your lsof usage.
3) Again, it is unclear what is your actual question. current working directory is not a file descriptor and is only represented with an inode. A process may or may not keep fd for a linked library around, but mapping itself occupies a 'struct file'.

Is file object local to every process or System wide?

As a Linux device driver developer i was in the idea that file object is local structure to every process and its address is available in the fd table for the corresponding fd. But when i came across section 5.6 in Linux Programming interface by Michale Kerrisk which states that
Two different file descriptors that refer to the same open file description share
a file offset value. Therefore, if the file offset is changed via one file descriptor
(as a consequence of calls to read(), write(), or lseek()), this change is visible
through the other file descriptor. This applies both when the two file descrip
tors belong to the same process and when they belong to different processes.
I am befuddled...Kindly some one help me improve my understanding.
Each process does have its own file descriptor table, and each time a file is open()ed yields a separate file description. So there is sanity there!
The exception is when a file descriptor is duplicated, either within a process (via dup()) or across processes (by one process fork()ing a copy with all the same FDs, or by passing a file descriptor through a UNIX domain socket). When this happens, the two descriptors end up sharing some properties with each other, including the offset.
This is not necessarily a bad thing. It means, for instance, that two processes that are both writing to a shared file descriptor will not end up overwriting each other's output. It can sometimes have unexpected results, though. But it's not usually something that you'd end up with without knowing about it.

In linux , how to create a file descriptor for a memory region

I have some program handling some data either in a file or in some memory buffer. I want to provide uniform way to handle these cases.
I can either 1) mmap the file so we can handle them uniformly as a memory buffer; 2) create FILE* using fopen and fmemopen so access them uniformly as FILE*.
However, I can't use either ways above. I need to handle them both as file descriptor, because one of the libraries I use only takes file descriptor, and it does mmap on the file descriptor.
So my question is, given a memory buffer (we can assume it is aligned to 4K), can we get a file descriptor that backed by this memory buffer? I saw in some other question popen is an answer but I don't think fd in popen can be mmap-ed.
You cannot easily create a file descriptor (other than a C standard library one, which is not helpful) from "some memory region". However, you can create a shared memory region, getting a file descriptor in return.
From shm_overview (7):
shm_open(3)
Create and open a new object, or open an existing object. This is analogous to open(2). The call returns a file descriptor for use by the other interfaces listed below.
Among the listed interfaces is mmap, which means that you can "memory map" the shared memory the same as you would memory map a regular file.
Thus, using mmap for both situations (file or memory buffer) should work seamlessly, if only you control creation of that "memory buffer".
You could write (perhaps using mmap) your data segment to a tmpfs based file (perhaps under /run/ directory), then pass the opened file descriptor to your library.

Multiple file descriptors to the same file, C

I have a multithreaded application that is opening and reading the same file (not writing). I am opening a different file descriptor for each thread (but they all point to the same file). Each thread then reads the file and may close it and open it again if EOF is reached. Is this ok? If I perform fclose() on a file descriptor does it affect the other file descritptors that point to the same file?
For Linux systems you don't need multiple file descriptors to do this. You can share a single file descriptor and use pread to atomically do a seek / read operation without modifying the file descriptor at all.
That's ok. You can open all times you want the same file and each file descriptor will be independent from each other.
That should work fine, provided each thread has its own file handle. Since you mention use of fclose(), that suggests you are also using fopen() in each thread and each thread only affects its own FILE * variable.
Is there a problem?

Resources