Determining filesystem for block device if module not loaded - linux

I've been wondering this for a while now. When using Linux and plugging in an e.g. USB stick or external storage device via USB, how does the kernel determine which filesystem is on that device, if the correct module is not currently loaded in memory?
Assume that the external storage device is ext4 formatted. At the time of plugging in the device, the ext4 module has not been loaded into memory yet. Now, usually the kernel tries to probe the different filesystems by calling the appropiate *_fill_super function of the respective module. But that only works if the module is even present at the time of probing. How does the kernel handle mounting devices in situations where the correct filesystem module is not yet loaded in memory?
For me it seems to be a kind of chicken-egg problem as the *_fill_super function needed to determine the correct module is located in the module itself. So the kernel wants to probe the filesystem on the device, but for that it requires the appropiate module to be loaded. But it is not loaded at that time, so the kernel neither knows what filesystem is on the device and thus does not know what filesystem module to load in the first place.
How does the kernel handle this? I'm thankful for any explanation or even references to the code where this logic happens.

In short: Unless some user space configuration file is aware of a filesystem, mount will fail to auto-detect this filesystem if its driver is not loaded.
.. usually the kernel tries to probe the different filesystems by calling the appropiate *_fill_super function of the respective module.
This is not quite true. The mount system call accepts filesystem name as a parameter, and this parameter is required for all cases except re-mount, which is out of scope of the current question. Moreover, the filesystem driver is expected to be already loaded at the time mount system call is performed.
That is, the kernel by itself never "probes" a filesystems, the kernel just tries the filesystem requested by the user space.
Actually, it is user space tool mount which performs probing.
Before probing, the mount tool could attempts to deduce a filesystem by inspecting /etc/fstab, by using blkid or other mechanisms.
After all deducing mechanisms have failed, the mount tool probes filesystems according to /etc/filesystems file. This file could contain a predefined (static) list of filesystems.
After attempts for every line in /etc/filesystems have failed (or if the file doesn't exist), the mount tool probes filesystems listed in /proc/filesystems. This file is provided by the kernel and contains all filesystems which are registered (for which a driver is loaded).
After attempts for every line in /proc/filesystems have failed, the mount tool reports an error.
The algorithm of choosing a filesystem by the mount tool is described in man mount under the paragraph "If no -t option is given ..". The question and its answers describe similar things.

Related

Device mapper, boot with virtual device

I have a task to make a virtual device under a real one with the help of device mapper kernel module. Virtual device must transfer any request to a real device, so both devices must be equal.
In prospective I should be able to control requests, so I wrote kernel module, representing device mapper target, using this article.
After making module and inserting it (insmod command) I setup my device (dmsetup create). Then do mount and can work with a real device through just created virtual.
But the question is how to repeat above mentioned instructions in boot time? I'd like to use my virtual device as a general one (by changing fstab, I guess).
Thanks in advance!
If you are going to use your device as the root filesystem, you need to create an initramfs that sets it up. Basically a shell script that issues dmsetup commands, followed by a mount and finally pivot_root to the new filesystem.
There was a discussion on the dm-devel mailing list last year on how to do this in the Linux kernel without an initramfs, by specifying mapper lines on the kernel command line. This is they way Chrome OS does it, because they can't/won't use an initramfs. See here for documentation of this feature. The functionality was never merged though.
The patch series was updated and resubmitted in May 2017. Hopefully we will eventually see it merged in some form or other.
If you are not going to use your device as the root filesystem, you can still use the same approach if you want, but there might be easier ways.

what is the difference between accessing a pci device using procfs vs sysfs

procfs file: /proc/bus/pci/00/00.0
vs.
sysfs file: /sys/bus/pci/devices/0000:00:00.0/resource
I have seen some drivers use the procfs file and some use sysfs. What is the difference? For what I need, I find that mmap-ing the sysfs/resource<n>? file and read/write works as I need it to, but similar operation on the procfs file does not work. But obviously the procfs file is used successfully elsewhere.
The procfs file you cite (/proc/bus/pci/00/00.0) provides access to the device's configuration header. It is also accessible in sysfs as /sys/bus/pci/devices/0000:00:00.0/config.
The sysfs file you're talking about (/sys/bus/pci/devices/0000:00:00.0/resource<N>) provides access to the device's BAR regions. See https://en.wikipedia.org/wiki/PCI_configuration_space for an explanation of the relationships. Also, you may want to read the linux kernel documentation at
https://www.kernel.org/doc/Documentation/filesystems/sysfs-pci.txt

Linux: How does communication with kernel module from user space happen

I am reading Embedded Linux Primer book and The Linux Kernel Module Programming Guide and I am confused about user space application communication with kernel module
User space App --> Device node/proc file --> kernel module ( which resides in /lib/modules/)
1) What is exact difference when we communicate with device node method ( /dev/ - with open,read,write,close calls) and /proc/file method ?
procfs (/proc) should be reserved for process information a module should not put any file there. At some point, procfs was the only available pseudo filesystem, that is why you can find sound system or RTC information. Then, sysfs was created to properly contain those information.
The main difference between using a device file (usually residing in /dev) and a file from procfs is the way it is handled in the kernel.
Operations used for the device files are registered using the file_operations structure usually with cdev_init and cdev_add for a character device. Your module may not do that as quite often, the subsytem is the one registering your device.
While the operations for files in procfs are registered using proc_create

How can my kernel module access a PCI device without using pci_get_device()?

At present, I have a Linux 2.6 kernel module which accesses a certain device via pci_get_device() and pci_read_config_dword(). In future, the module shall be modified to also work a different machine which seems to have no PCI bus (/sys/bus/pci doesn't exist), but has the certain device at a fixed, known address. Now, I would like to have one module binary without parameters which works on both machines. To be able to load the module on the non-PCI machine, I think I must refrain from using pci_get_device() etc.; thus I have to get the needed config space info on the PCI machine in some other way. I could read it from /sys/bus/pci/devices/.../resource in my init_module(), but I gather it is considered bad practice to make kernel modules read files. Are there better ways to achieve my goal?
When functions like pci_get_device() cannot be used (because the module must work also with kernels that don't provide such functions), apparently there is no better way to get the PCI address info than reading /sys/bus/pci/devices/.../resource.
I resorted to doing so, using filp_open(), vfs_read() and filp_close() on the basis of How to read/write files within a Linux kernel module?.

How to communicate with a Linux kernel module from user space without littering /dev with new nodes?

What are the ways to communicate with a kernel module from user space? By communication i mean sending information and commands between the kernel module and a user space process.
I currently know of two way:
open/close/read/write/ioctl on published device node.
read/write on exported and hooked /proc file.
More specifically, can someone advice the best way to communicate with a kernel module that does not actually drives any hardware and therefore should not be littering /dev with stub nodes that exists solely for ioctl calls? I mostly need to check its various status variables and send it a block of data with a request type tag and see if the request succeeded.
Netlink sockets are designed for that kind of requirements, too...
Also see
man 7 netlink
libnl - Netlink library
The libnl Archives
There's also the /sys filesystem (sysfs):
Sysfs exports information about
devices and drivers from the kernel
device model to userspace, and is also
used for configuration.
(from Wikipedia)
You could also read/write from /dev device nodes.
IMHO, /dev is already littered with stuff and adding your own nodes there isn't a big issue. Don't forget that you can have lots of ioctl codes for a single device node, and the ioctl paramters are passed by reference so can be as big as you like.
Third one is add a new syscall, but the two you have written are the preferred ones, I think. I've found this document that might help, but I still think this option is unadvised: http://www.csee.umbc.edu/courses/undergraduate/CMSC421/fall02/burt/projects/howto_add_systemcall.html
Another acceptable option might be sharing memory.
You could also use Shared Memory and IOCTL
debugfs is another good possibility for APIs that are less stable than sysfs, but the API is basically the same. Here is a minimal runnable example.
configfs is another one. It allows easy dynamic creation of kernel objects from userspace through the filesystem: https://www.kernel.org/doc/Documentation/filesystems/configfs/configfs.txt
In any case, you will have to dirty some namespace... a filesystem entry in case of sysfs and debugfs. Just choose your poison.
Also, udev rules make /dev very similar to sysfs and debugfs: How to create a device in /dev automatically upon loading of the kernel module for a device driver?

Resources