What module is the i/o scheduler - linux

At this point I have no need to modify the schedulers though that may change. Presently, my endeavor is to understand them. I've done a fair amount of reading on the subject from a variety of sources: wikipedia, Linux Kernel Development 2nd edition (ch. 10), Linux Driver Development 3rd edition (ch. 13) and a handful of others. I've got a fair understanding of the 4 main schedulers and how they work. However, I'm not yet sure of what they are.
From the code, e.g. block/noop-iosched.c, it appears to be a kernel module. But, when I do lsmod I don't see anything that jumps out as being the schedulers: e.g. nothing is named noop or cfq. Further, I don't see anything like
<scheduler> <size> <used> scsi_transport_sas
Which is what I would expect to have seen since it is the SAS transport which dequeues the requests from the request queue and hands them to the LLD. At least, I'm assuming I should see something like this because I see this output from lsmod with respect to my LLD:
scsi_transport_sas 35652 1 mpt3sas
This mid-layer driver, scsi_transport_sas, is used by mpt3sas my actual SAS controller. Since the mid-layer driver dequeues for the device, I'm just assuming that some similar relationship would be present between the mid-layer and the I/O scheduler.
So, my question is, what are the schedulers? Are they modules? Are they integrated components of the kernel? Are they software libraries and expose the correct functionality and are compiled with the other storage stack drivers? The references of I've mentioned earlier are great at explaining the work they do and how block drivers interact with them, but they didn't exactly say what they are.

Related

In OS, why loadable kernel modules (LKMs) don't need to invoke message passing in order to communicate?

My question lies in a paragraph, the paragraph are shown as follow, I can't understand the the bold sentence. If it doesn't need to invoke message passing, how does it complete communication between process?
Modules
Perhaps the best current methodology for operating-system design involves
using loadable kernel modules (LKMs). Here, the kernel has a set of core
components and can link in additional services via modules, either at boot time
or during run time. This type of design is common in modern implementations
of UNIX, such as Linux, macOS, and Solaris, as well as Windows.
The idea of the design is for the kernel to provide core services, while
other services are implemented dynamically, as the kernel is running. Linking
services dynamically is preferable to adding new features directly to the kernel,
which would require recompiling the kernel every time a change was made.
Thus, for example, we might build CPU scheduling and memory management
algorithms directly into the kernel and then add support for different file
systems by way of loadable modules.
The overall result resembles a layered system in that each kernel section
has defined, protected interfaces; but it is more flexible than a layered system,
because any module can call any other module. The approach is also similar to
the microkernel approach in that the primary module has only core functions
and knowledge of how to load and communicate with other modules; but it
is more efficient, because modules do not need to invoke message passing in
order to communicate.
Linux uses loadable kernel modules, primarily for supporting device
drivers and file systems. LKMs can be “inserted” into the kernel as the system is started (or booted) or during run time, such as when a USB device is
plugged into a running machine. If the Linux kernel does not have the necessary driver, it can be dynamically loaded. LKMs can be removed from the
kernel during run time as well. For Linux, LKMs allow a dynamic and modular
kernel, while maintaining the performance benefits of a monolithic system. We
cover creating LKMs in Linux in several programming exercises at the end of
this chapter.
In OS, why loadable kernel modules (LKMs) don't need to invoke message passing in order to communicate?
The simple answer is that because they're loaded into kernel space and dynamically linked; the kernel can use "mostly normal" functions calls instead of anything more expensive (message passing, remote procedure calls, ...) to communicate with it.
Note: Typically (especially for *nix systems) a driver will provide a set of function pointers to the kernel (e.g. maybe one for open(), one for read(), one for ioctl(), etc) in some kind of "device context" structure; allowing the kernel to call the driver's functions via. the function pointers (e.g. like "result = deviceContext->open( ..);).
"The approach is also similar to the microkernel approach in that the primary module has only core functions and knowledge of how to load and communicate with other modules; but it is more efficient, because modules do not need to invoke message passing in order to communicate."
This paragraph has the potential to give you a false impression. For extensibility alone, modular monolithic kernels are similar to micro-kernels (and both are a lot more extensible than a "literally monolithic (one piece, like stone)" kernel). For other things (e.g. security) modular monolithic kernels are extremely dissimilar to micro-kernels.
For Linux specifically; you can think of it as almost 30 million lines (growing at a rate of over 1 million lines per year) of potential security vulnerabilities running at the highest privilege level with full access to every scrap of data, with an average of about 150 discovered critical vulnerabilities per year (and who knows how many undiscovered critical vulnerabilities).
One of the main goals of micro-kernels is to place isolation barriers between the "kernel core" and everything else; so that you might end up with several thousand lines of kernel that doesn't grow (and a significant improvement in security). It's those isolation barriers that require less efficient communication (e.g. message passing).
"...but it is more efficient, because modules do not need to invoke message passing in order to communicate."
This could be rephrased more correctly as "...but it is more efficient, because modules do not need to pass through an isolation barrier."
Note that message passing is merely one way to pass through an isolation barrier - there's shared memory, signals, pipes, sockets, remote procedure calls, etc. Nothing says a micro-kernel has to use message passing and you could design a micro-kernel that does not use message passing at all.

Linux kernel: Find all drivers reachable via syscalls

I am comparing a mainline Linux kernel source with a modified copy of the same source that has many drivers added. A little background: That modified source is an Android kernel source, it contains many drivers added by the vendor, SoC manufacturer, Google etc.
I am trying to identify all drivers added in the modified source that are reachable from userspace via any syscalls. I'm looking for some systematic or ideally automatic way to find all these to avoid the manual work.
For example, char device drivers are of interest, since I could perform some openat, read, write, ioctl and close syscalls on them if there is a corresponding device file. To find new character device drivers, I could first find all new files in the source tree and then grep them for struct file_operations. But besides char drivers, what else is there that I need to look for?
I know that the syscalls mentioned above do some kind of "forwarding" to the respective device driver associated with the file. But are there other syscalls that do this kind of forwarding? I think I would have to focus on all these syscalls, right?
Is there something I can grep for in source files that indicates that syscalls can lead there? How should I go about this to find all these drivers?
Update (narrowing down):
I am targeting specific devices (e.g. Huawei P20 Lite), so I know relevant architecture and hardware. But for the sake of this question, we can just assume that hardware for whatever driver is present. It doesn't really matter in my case if I invoked a driver and it reported back that no corresponding hardware is present, as long as I can invoke the driver.
I only look for the drivers directly reachable via syscalls. By directly reachable I mean drivers designed to have some syscall interface with userspace. Yes, syscalls not aimed at a certain driver may still indirectly trigger code in that driver, but these indirect effects can be neglected.
Maybe some background on my objective clarifies: I want to fuzz-test the found drivers using Syzkaller. For this, I would create descriptions of the syscalls usable to fuzz each driver that Syzkaller parses.
I'm pretty sure there is no way to do this programmatically. Any attempt to do so would hit up against a couple of problems:
The drivers that are called in a given case depend on the hardware. For example, on my laptop, the iwlwifi driver will be reachable via network syscalls, but on a server that driver won't be used.
Virtually any code loaded into the kernel is reachable from some syscall if the hardware is present. Drivers interact with hardware, which in turn either interacts with users, external devices, or networks, and all of these operations are reachable by syscalls. People don't write drivers that don't do anything.
Even drivers that aren't directly reachable by a system call can affect execution. For example, a driver for a true RNG would be able to affect execution by changing the behavior of the system PRNG, even if it weren't accessible by /dev/hwrng.
So for a generic kernel that can run on any hardware of a given architecture, it's going to be pretty hard to exclude any driver from consideration. If your hope is to trace the execution of the code by some programmatic means without actually executing it, then you're going to need to solve the halting problem.
Sorry for the bad news.

Quick questions about Linux kernel modules

I'm very familiar with Linux (I've been using it for 2 years, no Windows for 1 1/2 years), and I'm finally digging deeper into kernel programming and I'm working a project. So my questions are:
Will a kernel module run faster than a traditional c program.
How can I communicate with a module (is that even possible), for example call a function in it.
1.Will a kernel module run faster than a traditional c program.
It Depends™
Running as a kernel module means you get to play by different rules, you potentially get to avoid some context switches depending on what you are doing. You get access to some powerful tools that can be leveraged to optimize your code, but don't expect your code to run magically faster just by throwing everything in kernelspace.
2.How can I communicate with a module (is that even possible), for example call a function in it.
There are various ways:
You can use the various file system interfaces: procfs, sysfs, debugfs, sysctl, ...
You could register a char device
You can make use of the Netlink interface
You could create new syscalls, although that's heavily discouraged
And you can always come up with your own scheme, or use some less common APIs
Will a kernel module run faster than a traditional c program.
The kernel is already a C program, which is most likely be compiled with same compiler you use. So generic algorithms or some processor intensive computations will be executed with almost same speed.
But most userspace programs (like bash) have to ask kernel to perform some operations on system resources, i.e. print prompt onto monitor. It will require entering the kernel with system call, sending data over tty interfaces and passing to a video-driver, it may introduce some latency. If you'd implemented bash in-kernel, you may directly call video-driver, which is definitely faster.
That approach however, have drawbacks. First of all, bash should be able to print prompt on ssh-session or serial console, and that will complicate logic. Also, if your bash will hang, you cannot just kill, you have to reboot system.
How can I communicate with a module (is that even possible), for example call a function in it.
In addition to excellent list provided by #tux3, I would suggest to start with char devices.

When the linux kernel reports time as spent servicing "soft interrupts", precisely what does it mean?

I have top showing high activity in the %si column for several cpus. Sar reports something similar.
Q1) What are the possibilities for what might be executing in the kernel? It appears that "softirqs" themselves are more or less outdated, and generally function as a mechanism to implement other interfaces, including tasklets, rculists, and I'm not sure what else. I'd like to get a comprehensive list.
Q2) How can I get more precise information on what's actually running as "soft interrupts" on my test system?
As it happens, I have strong suspicions that a particular device driver is involved, since the hard interrupt % is also high, and only on whatever cpu that's presently handling this particular device's interrupt :-) But I haven't so far found anything in the driver source code that looks to me as if it could result in softirq activity. I'm probably missing the obvious, so I'm asking for help ;-)
My kernel is rather outdated - 2.6.32 based (RHEL 6.1, I believe) but I doubt that matters too much for questions this general.

Accessing /proc

I'm currently developing an application which needs a lot of system and process information, some of which is only available through /proc, and I have some general questions about accessing the structures.
The application will be run on Linux (kernel >= 2.6), not on any other Unix-flavored OS. It should have access to any data in /proc, I can't say what is necessary now as the specifications are not clear yet, but the whole /proc directory is relevant to the application.
First of all: Is there a good documentation which covers all the features added / removed from kernel version to kernel version? One thing I'm curious about in particular is the format of the individual files. Can I take that for granted? Does it change among kernel versions?
Hooking up the parsing process based on the kernel wouldn't be a problem at all, it's just that I couldn't find any good docs on what has changed from version to version which could help me catching parsing errors in beforehand.
In addition: Is there a definite list of features that can be activated / deactivated by kernel options (except of course the /proc-feature itself)? I'm looking for a list of files / directories that only exist with the appropriate options being set in the kernel.
As an example of what I'm thinking of, this is a link to the proc manpage (http://linux.die.net/man/5/proc) which includes a lot of good information, e.g. some options include the earliest kernel version they were available at, some include whether a module is necessary to be loaded. This does not describe the output format of all information though, which is something I need if I want to parse it (e.g. if it is consistent throughout all kernel versions or changed at some point).
The second thing I'm wondering about is what happens if the process queried dies while being queried. What is my time interval? For example if I'm going to fetch a list of processes reading all the structures, and parse them one after another, what happens if my process x dies before I get to read it? Even if I check if the directory exists, it could still be gone one application call later.
Last but not least: Is there any major distribution out there that is not mounting proc?
From what I understand, a lot of common tools are based on the /proc interface such as lsmod or free, so I'm guessing that I can expect /proc to exist almost always.
The /proc interfaces are pretty stable (unlike the /sys interfaces), even if nothing is guaranteed. Almost all changes are backwards compatible, at least if they've been around for a few versions. You should
stick to the documented interfaces to be safe. If a file exists, its format may be extended in later versions, but normally in a backwards compatible way, e.g. adding columns to a table. The parts that are most at risk of disappearing are parts concerning hardware susbystems such as ACPI or SCSI, which are migrating to /sys (with a long transition period when both exist).
Most of the information is architecture-independent, except for hardware information (e.g. /proc/cpuinfo has very different fields on different architectures).
The main documentation is Documentation/filesystems/proc.txt in the kernel source. Consider proc(5) to be the overview and proc.txt to be the fine details. The kernel documentation is often incomplete, so don't be surprised if you need to resort to reading the source sometimes.
Most optional parts of /proc are activated by default if the driver whose data it exposes is included in the kernel. The exceptions are mostly related to hardware features that rarely need to be accessed from outside the kernel; if you need to access these features, you're probably already expecting to need to dig deeper. Look through Kconfig files in the kernel source for detailed information.
Process data (or hardware data related to removable hardware or provided by unloadable modules) can disappear under your nose. Most files under /proc can be read atomically, with a single read call with a reasonably-sized buffer; if you perform multiple read calls in sequence, drivers are supposed to guarantee that you get well-formed data. There is no way to guarantee atomicity between reads of separate files; if you're reading information about a process, this process can die at any time, and in principle could even be replaced by another process with the same PID before you're finished.
As it says in the description of /proc, “everyone should say Y here”. All desktop/server Linux systems and most embedded Linux systems must have /proc; a lot of things, including ps and other process management commands, many filesystem and device-related tools, and module loading, require it. The only systems that might be able to dispense with /proc are very small single-purpose embedded systems that support a single hardware configuration and run a fixed set of programs. You can count on its being here.

Resources