How VFS knows which underlying File system functions to be called? - linux

When ever we fire a command on linux terminal.The process thus created traverses to the VFS layer,where it decides which file system function to be called like ext4 ,ext3 or anyother filesystem. So my question is How does the VFS differntiate the filesystems? form where the VFS gets the filesystem information,is it the fs_struct in task_struct that tells the VFS ?

As a part of the FS implementation you need to implement file, inode, superblock operations, which will register the the underlying FS ops (ex: ext3_open()) with VFS layer. Depending the the path to the file provided to the open(), VFS will invoke the appropriate file system specific implementation of the syscall.
Lets say you have already mounted a file system, when you are mounting a file system you register your FS for specific operations with VFS layer during the module initialization. During this step, two handlers get_sb() and kill_sb(). get_sb() is called at the time of mounting the file system. kill_sb() is called at the time of unmounting the file system.
For more information refer to RKFS and look into the how the file operations are implemented along with the data flow diagrams.

Related

libmount equivalent for FUSE filesystems

What is the libmount equivalent function to mount a FUSE file-system. I understand that FUSE is not a real file system and my strace of mount.fuse shows opening a /dev/fuse file and doing some complicated manipulations.
I tried seeing how the mount.fuse works by reading it's source code but not only it is needlessly complicated by string manipulations in C, it is a GPL program.
My question is, am I missing the obvious API to mount fuse file systems?
The kernel interface for mounting a FUSE filesystem is described in "linux/Documentation/filesystems/fuse.txt" (for example, see here).
In a nutshell, you call mount(2) as you would to mount any filesystem. However, the key difference is that you must provide a mount option fd=n where n is a file descriptor you've obtained by opening /dev/fuse and which will be used by the userspace process implementing the filesystem to respond to kernel requests.
In particular, this means that the mount is actually performed by the user space program that implements the filesystem. Specifically, most FUSE filesystems use libfuse and call the function fuse_main or fuse_session_mount to perform the mount (which eventually call the internal function fuse_mount_sys in mount.c that contains the actual mount(2) system call).
So, if you want to mount a FUSE filesystem programmatically, the correct way to do this is to fork and exec the corresponding FUSE executable (e.g., sshfs) and have it handle the mount on your behalf.
Note that /sbin/mount.fuse doesn't actually mount anything itself. It's just a wrapper to allow you to mount FUSE filesystems via entries in "/etc/fstab" via the mount command line utility or at boot time. That's why you can't find any mounting code there. It mounts FUSE filesystems the same way I described above, by running the FUSE executable for the filesystem in question to perform the actual mount.

Linux: mmap() for non-regular files

I understand that mmap() allows an application to map a file into memory, so that there's a one-on-one correspondence between a memory address and a word in the file.
But my question is what if the file is a non-regular file created by a device driver? As I know, some non-regular files are mmap-able, some are not. What does that mean from programming's perspective? What should I do if I want my non-regular file to be mmap-able?
I have worked on a Linux-kernel-module in which I implemented mmap function pointer(struct file_operations). This module would create a device entry in /dev/ directory. Now my user-space application would open this entry using "open" and would make a mmap system call. Eventually inside the Linux-Kernel-module I insmoded the mmap function will be called and will do the implemented processing and will return back to the user-space.
This was just an example to represent the service requested by the user-space to the OS(Kernel).
When ever user wants to access hardware or wants to request service from the kernel(like mapping physical memory to user-virtual-address-space), it can do it using the entry created by the driver in the /dev/ or /sys/ or /proc/ etc. These files can be termed as "virtual interface" to the kernel.

is it correct to to call the procfs as the VFS?

A virtual file system (VFS) or virtual filesystem switch is an abstraction layer on top of a more concrete file system. The purpose of a VFS is to allow client applications to access different types of concrete file systems in a uniform way.
This definition seems to be perfect if we see the actual work of VFS.
But at some places people call the procfs and sysfs also the virtual file system because they ( procfs and sysfs ) do not exist actually and are based on dynamic information collected from different processes.
So is it correct to to call the procfs as the VFS. I do not feel so , if it is correct then we are not keeping VFS definition, VFS is a layer to inter-operate among various file systems. It is not a particular file system in itself. What do you say?
Procfs, sysfs, debugfs, etc are not the VFS.
They are proper filesystem implementations, lying 'under' the VFS layer.
It's important to realize that they are real filesystems in all respects; it's just that they "live" in RAM. As they don't use a non-volatile storage medium, they are sometimes referred to as 'volatile' filesystems or pseudo-fs.
I would like to mention here what I finally concluded.VFS is Virtual Filesystem Switch when used as an abstraction layer because it is helping the filesystem switching on the go. While procfs even we consider it as a filesystem, It will be known as virtual file system not a VFS.

lock file or partition for read and write systemcalls

I need to know how to write a system call that blocks(lock) and unblocks(unlock) an archive(inode) or a partition(super_block) for read and write functions.
Example: these function are in fs.h
lock_super(struct super_block *);
unlock_super(struct super_block *);
How to obtain the super_block (/dev/sda1 for example)?
The lock_super and unlock_super calls are not meant to be controlled directly by the user level processes. It is only meant to be called by the VFS layer, when a operation(operation on inode) on the filesystem is called by the user process. If you still wish to do that, you have to write your own device driver and expose the desired functionality(locking unlocking of the inode) to the user level.
There are no current system calls that would allow you to lock, unlock inodes. There are many reasons why it is not wise to implement new system call, without due consideration. But if you wish to do that, you would need to write the handler of your own system call in the kernel. It seems you want fine-grain control over the file-system, perhaps you are implementing user-level file-system.
For the answer on how to get the super_block, every file-system module registers itself with the VFS(Virtual File System). VFS acts as a intermediate layer between user and the actual file-system. So, it is the VFS that knows about the function pointers to the lock_super and unlock_super methods. The VFS Superblock contains the "device info" and "set of pointers to the file-system superblock". You can get those pointers from here and call them. But remember, because the actual file-system is managed by the VFS, you would be potentially corrupting the data.

How to read/write from/to a linux /proc file from kernel space?

I am writing a program consisting of user program and a kernel module. The kernel module needs to gather data that it will then "send" to the user program. This has to be done via a /proc file. Now, I create the file, everything is fine, and spent ages reading the internet for answer and still cannot find one. How do you read/write a /proc file from the kernel space ? The write_proc and read_proc supplied to the procfile are used to read and write data from USER space, whereas I need the module to be able to write the /proc file itself.
That's not how it works. When a userspace program opens the files, they are generated on the fly on a case-by-case basis. Most of them are readonly and generated by a common mechanism:
Register an entry with create_proc_read_entry
Supply a callback function (called read_proc by convention) which is called when the file is read
This callback function should populate a supplied buffer and (typically) call proc_calc_metrics to update the file pointer etc supplied to userspace.
You (from the kernel) do not "write" to procfs files, you supply the results dynamically when userspace requests them.
One of the approaches to get data across to the user space would be seq_files. In order to configure (write) kernel parameters you may want to consider sys-fs nodes.
Thanks,
Vijay

Resources