Why do we need directory structure for file system? - linux

Jos of MIT OS lesson only uses File structure to describe regular file or dir.
But linux kernel uses dentry/inode/file structure to describe files.
Is it neccessary to use dentry for file system?

In Linux, dentry is a directory entry that associates inode and file object, but it is not necessary just a directory, could represent a file. Dentry enables the hard link which allows allow multiple hard links to be created for the same file. So you can create multiple names for the same file.
Dentry cache also does matter for performance of File system. The following picture is from "Understanding the Linux Kernel, 3rd Edition" which shows interactions between processes and VFS objects.

Jos does use directory entries. It just uses the File object to store directories (they use teh same object for storing directory data and file data)

Related

what does the author mean by directory structure in operating system?

I'm reading Operating System Concepts by Avi Silberschatz(9thE), in section 11.4 File-System Mounting, the author explains the steps of filesystem mounting as follows:
The operating system is given the
name of the device and the mount point—the location within the file structure
where the file system is to be attached.
Next, the operating system verifies that the device contains a valid file
system.
Finally, the operating
system notes in its directory structure that a file system is mounted at the
specified mount point.
I'm confused with the final step, since to the best of my knowledge, the directory structure is stored somewhere on the disk, which records the files' information -- such as name, location, size, and type. Then what does the author mean by directory structure in operating system? Is it the same directory on disk?
Additionally, which part finishes the conversion from file name to physical address on disk? Is it the disk driver or the disk controller or done by processor with memory?
What you are reading is largely nonsense. To begin with, it is eunuchs specific. Eunuchs variants tend to have a single directory structure containing all disks and even things that are not really files.
Let us assume that you are on Windoze. If you mount a disk the drive gets a name, typically a single letter but larger names are possible in some cases. Let's say you mount a disk drive, and the system assigns it to "Q:".
So now Q: is available and you can access files, by specifying something like
"Q:\dir1\dir2\file.type"
You are just accessing the directory structure that exists on Q:.
Each drive has a separate, independent directory structure.
Many operating system operate this way and your sequence above is irrelevant to them.
Eunchs variants do not work this way. The system maintains a single directory starting at "/" which is the root directory for the system. This is a directory maintained by the operating system and does not exist at all on a disk drive.
On a Mac, for instance, there is a "/Volumes" directory that contains all the drives mounted. These too are directories maintained by the operating system and do not exist at all on a disk drive.
"/Volumes/Macintosh HD"
"/Volumes/Backup Drive"
These system directories then link to the directories that are stored on those disks. Thus, in Eunuchs, there are directories maintained by the operating system and directories maintained on the disk that are merged together.
So if you want to find "/Volumes/Backup Drive/dir/something.txt" the system goes to the root "/" finds "Volumes" and determines this is a system directory. Finds "Backup Drives" and determines this is a disk drive that has been mounted. Goes to the root directory of the drive and find that "dir" is a directory on the drive, and finds the file something.txt.
To add to the confusion, there are disk formats that have no directory structure at all. But this illustrates that your book is taking you on a confusing path.
Each disk drive has a format of some kind. E.g., NTFS, ODS-11, FAT, ....
What I am telling you from here on is generalization of what typically happens but there are large variations in how it works among systems.
Typically, each drive will have a header that includes a description of block clusters in use (often a bitmaps) and files on the disk. The file description will usually have a file name, date created, owner, etc. The file description will also have information about where the data is stored on the disk.
The drive often will have a directory structure in which there is some file it defines as the root directory. The directory structure exists by creating directory files within other directory files. A directory is normally just a file that has a list of file names and the address of their description in the the disk header. Other file attributes, such as the file size and date of creation, are not stored in the directory.You get that from the file description in the disk header.
The file structure in the disk header is separate from the directory structure. In fact, it is often possible to create a file that is not even in a directory at all. Or you can put a single file in multiple directories.
If your disk gets trashed and has to be recovered, this is usually done by looking at the disk header. You get back your files but lose your directory structure.
Additionally, which part finishes the conversion from file name to physical address on disk? Is it the disk driver or the disk controller or done by processor with memory?
The logical location on the disk is specified in the file description in the disk header. The format of that information is specific to the underlying disk format. Generally you have two paths to reach the file description:
You can go through the list of file headers maintained by the disk; or
You can navigate a directory structure until you find the file name you want with a link to the file description.

Resolving file descriptor to file name / file path

I am currently developing a simple kernel module that can steal system calls such as open, read, write and replace them with a simple function which logs the files being opened, read, written, into a file and return the original system calls.
My query is, I am able to get the File Descriptor in read and write system calls, but I am not able to understand how to obtain file name using the same.
Currently I am able to access the file structure associated with given FD using following code:
struct file *file;
file = fcheck(fd);
This file structure has two important entities in it, which are of my concern I believe:
f_path
f_inode
Can anybody help me get dentry or inode or the path name associated with this fd using the file structure associated with it?
Is my approach correct? Or do I need to do something different?
I am using Ubuntu 14.04 and my kernel version is 3.19.0-25-generic, for the kernel module development.
.f_inode is actually an inode.
.f_path->dentry is a dentry.
Traversing this dentry via ->d_parent link, until f_path.mnt.mnt_root dentry will be touched, and collecting dentry->d_name components, will construct the file's path, relative to the mount point. This is done, e.g., with d_path, but in more carefull way.
Instead of fcheck(fd), which should be used inside RCU read section, you can also use fget(fd), which should be paired with fput().
The approach is completely incorrect - see http://www.watson.org/~robert/2007woot/
Linux already has a reliable mechanism for doing this thing (audit). If you want to implement it anyway (for fun I presume), you want to place your hooks roughly where audit is doing that. Chances are LSM hooks are in appropriate places, have not checked.

Syncing a file system that has no file on it

Say I want to synchronize data buffers of a file system to disk (in my case the one of an USB stick partition) on a linux box.
While searching for a function to do that I found the following
DESCRIPTION
sync() causes all buffered modifications to file metadata and
data to be written to the underlying file sys‐
tems.
syncfs(int fd) is like sync(), but synchronizes just the file system
containing file referred to by the open file
descriptor fd.
But what if the file system has no file on it that I can open and pass to syncfs? Can I "abuse" the dot file? Does it appear on all file systems?
Is there another function that does what I want? Perhaps by providing a device file with major / minor numbers or some such?
Yes I think you can do that. The root directory of your file system will have at least one inode for your root directory. You can use the .-file to do that. Play also around with ls -i to see the inode numbers.
Is there a possibility to avoid your problem by mounting your file system with sync? Does performance issues hamper? Did you have a look at remounting? This can sync your file system as well in particular cases.
I do not know what your application is, but I suffered problems with synchronization of files to a USB stick with the FAT32-file system. It resulted in weird read and write errors. I can not imagine any other valid reason why you should sync an empty file system.
From man 8 sync description:
"sync writes any data buffered in memory out to disk. This can include (but is not
limited to) modified superblocks, modified inodes, and delayed reads and writes. This
must be implemented by the kernel; The sync program does nothing but exercise the sync(2)
system call."
So, note that it's all about modification (modified inode, superblocks etc). If you don't have any modification, it don't have anything to sync up.

How to check in linux kernel at vfs layer whether the file object is for a directory or a file

How to check in linux kernel at vfs layer whether the file object is for a directory or a file?
I have found that there is a function called is_dx(dir) which checks for this but it is present in namei.c in ext3 or ext4. I need to do this at vfs layer that is independent of the file system.
How about the S_ISDIR() macro defined in include/linux/stat.h? It takesinode->i_mode field to check if the inode in question belongs to a directory or a file.
Having in hand the inode of the initial directory, the code
examines the entry matching the first name to get the
corresponding inode.
q Then the directory file having that node is read from disk and
the entry matching the second name is examined to derive the
corresponding inode.
q This procedure is repeated for each name included in the path.
The dentry cache considerably speeds up the procedure
File system operations are mostly done at the dcache level , so
they are all under kernel lock.

Why can't files be manipulated by inode?

Why is it that you cannot access a file when you only know its inode, without searching for a file that links to that inode? A hard link to the file contains nothing but a name and a number telling you where to find the inode with all the real information about the file. I was surprised when I was told that there was no usermode way to use the inode number directly to open a file.
This seems like such a harmless and useful capability for the system to provide. Why is it not provided?
Security reasons -- to access a file you need permission on the file AS WELL AS permission to search all the directories from the root needed to get at the file. If you could access a file by inode, you could bypass the checks on the containing directories.
This allows you to create a file that can be accessed by a set of users (or a set of groups) and not anyone else -- create directories that are only accessable by the the users (one dir per user), and then hard-link the file into all of those directories -- the file itself is accessable by anyone, but can only actually be accessed by someone who has search permissions on one of the directories it is linked into.
Some Operating Systems do have that facility. For example, OS X needs it to support the Carbon File Manager, and on Linux you can use debugfs. Of course, you can do it on any UNIX from the command-line via find -inum, but the real reason you can't access files by inode is that it isn't particularly useful. It does kindof circumvent file permissions, because if there's a file you can read in a folder you can't read or execute, then opening the inode lets you discover it.
The reason it isn't very useful is that you need to find an inode number via a *stat() call, at which point you already have the filename (or an open fd)...or you need to guess the inum.
In response to your comment: To "pass a file", you can use fd passing over AF_LOCAL sockets by means of SCM_RIGHTS (see man 7 unix).
Btrfs does have an ioctl for that (BTRFS_IOC_INO_PATHS added in this patch), however it does no attempt to check permissions along the path, and is simply reserved to root.
Surely if you've already looked up a file via a path, you shouldn't have to do it again and again?
stat(f,&s); i=open(f,O_MODE);
involves two trawls through a directory structure. This wastes CPU cycles with unnecessary string operations. Yes, the well-designed file system cache will hide most of this inefficiency from a casual end-user, but repeating work for no reason is ugly if not plain silly.

Resources