Data-link layer libraries linux kernel - linux

I am implementing a routing protocol. For this to work, I need to know failures at the data-link layer. Are there libraries available irrespective of the underlying data-link layer protocol, which gives me hooks (like netfilter) to capture such information.
Since, this is an experiment on the protocol, I'm trying to find if there is anything that is available so that it can be implemented on the user-space rather than writing a kernel-module for the same.(Since, I'm totally new to kernel programming)
Any heads-up for the same will be really helpful.

Just a guess:
you can view sysfs entries ( suppose you have sysfs configured in your kernel) about your network interface, like:
cat /sys/class/net/eth0/carrier # link carrier status
1
cat /sys/class/net/eth0/operstate # should be also related, but
up # forget about what it means.

Related

ioctl vs kernel modules in Linux

I know that kernel modules are used to write device drivers. You can add new system calls to the Linux kernel and use it to communicate with other devices.
I also read that ioctl is a system call used in linux to implement system calls which are not available in the kernel by default.
My question is, why wouldn't you just write a new kernel module for your device instead of using ioctl? why would ioctl b useful where kernel modules exist?
You will need to write a kernel driver in either case, but you can choose between adding a new syscall and adding a ioctl.
Let's say you want to add a feature to get the tuner settings for a video capturing device.
If you implement it as a syscall:
You can't just load a module, you need to change the kernel itself
Hundreds of drivers could each add dozens of syscalls each, kludging up the table with thousands of global functions that must be kept forever.
For the driver to have any reach, you will need to convince kernel maintainers that this burden is worthwhile.
You will need to upstream the definition into glibc, and people must upgrade before they can write programs for it
If you implement it as an ioctl:
You can build your module for an existing kernel and let users load it, without having to get kernel maintainers involved
All functions are simple per-driver constants in the applicable header file, where they can easily be added or removed
Everyone can start programming with it just by including the header
Since an ioctl is much easier, more flexible, and exactly meant for all these driver specific function calls, this is generally the preferred method.
I also read that ioctl is a system call used in linux to implement system calls which are not available in the kernel by default.
This is incorrect.
System calls are (for Linux) listed in syscalls(2) (there are hundreds of them between user space and kernel land) and ioctl(2) is one of them. Read also wikipage on ioctl and on Unix philosophy and Linux Assembler HowTo
In practice, ioctl is mostly used on device files, and used for things which are not a read(2) or a write(2) of bytes.
For example, a sound is made by writing bytes to /dev/audio, but to change the volume you'll use some ioctl. See also fcntl(2) playing a similar role.
Input/output could also happen (somehow indirectly ...) thru mmap(2) and related virtual address space system calls.
For much more, read Advanced Linux Programming and Operating Systems: Three Easy Pieces. Look into Osdev for more hints about coding your own OS.
A kernel module could implement new devices, or new ioctl, etc... See kernelnewbies for more. I tend to believe it might sometimes add a few new syscalls (but this was false in older linux kernels like 3.x ones)
Linux is mostly open source. Please download then look inside source code. See also Linux From Scratch.
IIRC, Linux kernel 1.0 did not have any kernel modules. But that was around 1995.

Linux kernel: Find all drivers reachable via syscalls

I am comparing a mainline Linux kernel source with a modified copy of the same source that has many drivers added. A little background: That modified source is an Android kernel source, it contains many drivers added by the vendor, SoC manufacturer, Google etc.
I am trying to identify all drivers added in the modified source that are reachable from userspace via any syscalls. I'm looking for some systematic or ideally automatic way to find all these to avoid the manual work.
For example, char device drivers are of interest, since I could perform some openat, read, write, ioctl and close syscalls on them if there is a corresponding device file. To find new character device drivers, I could first find all new files in the source tree and then grep them for struct file_operations. But besides char drivers, what else is there that I need to look for?
I know that the syscalls mentioned above do some kind of "forwarding" to the respective device driver associated with the file. But are there other syscalls that do this kind of forwarding? I think I would have to focus on all these syscalls, right?
Is there something I can grep for in source files that indicates that syscalls can lead there? How should I go about this to find all these drivers?
Update (narrowing down):
I am targeting specific devices (e.g. Huawei P20 Lite), so I know relevant architecture and hardware. But for the sake of this question, we can just assume that hardware for whatever driver is present. It doesn't really matter in my case if I invoked a driver and it reported back that no corresponding hardware is present, as long as I can invoke the driver.
I only look for the drivers directly reachable via syscalls. By directly reachable I mean drivers designed to have some syscall interface with userspace. Yes, syscalls not aimed at a certain driver may still indirectly trigger code in that driver, but these indirect effects can be neglected.
Maybe some background on my objective clarifies: I want to fuzz-test the found drivers using Syzkaller. For this, I would create descriptions of the syscalls usable to fuzz each driver that Syzkaller parses.
I'm pretty sure there is no way to do this programmatically. Any attempt to do so would hit up against a couple of problems:
The drivers that are called in a given case depend on the hardware. For example, on my laptop, the iwlwifi driver will be reachable via network syscalls, but on a server that driver won't be used.
Virtually any code loaded into the kernel is reachable from some syscall if the hardware is present. Drivers interact with hardware, which in turn either interacts with users, external devices, or networks, and all of these operations are reachable by syscalls. People don't write drivers that don't do anything.
Even drivers that aren't directly reachable by a system call can affect execution. For example, a driver for a true RNG would be able to affect execution by changing the behavior of the system PRNG, even if it weren't accessible by /dev/hwrng.
So for a generic kernel that can run on any hardware of a given architecture, it's going to be pretty hard to exclude any driver from consideration. If your hope is to trace the execution of the code by some programmatic means without actually executing it, then you're going to need to solve the halting problem.
Sorry for the bad news.

PCIe device discovery algorithm pseudo code

I have a PCIe model written in System Verilog, although I think this question is language agnostic. The model performs PCIe configuration reads and writes and memory reads and writes perfectly in simulation. However, what I need to do is "discover" my PCIe device and configure my config space registers in simulation. Is there a boiler plate chunk of pseudo code that represents the Linux PCIe enumeration process that I can just add my own models transactions functions too so that I can get a "Bus walk", followed by BAR programming, SR-IOV enable if discovered, MSIx config? It seems like this would be a common exercise for PCIe device so maybe there is model.
It isn't terribly difficult to do. Basically you loop through the config space, checking for each each possible device on the first root bus 0. When a device is found, you allocate a memory space for it based on its requested size and program the BARs accordingly. If you find any bridges, you also configure and enable them - the basic bridge registers for this are standard. This includes assigning the upstream and downstream bus numbers, which then allows you to enumerate the new downstream bus, and so on.
I had to do this once to access a PCI I/O card on a system that had no OS or other software environment. It wasn't too bad and that was across two bridges from two vendors, as well as the I/O card registers and the CPU bus root bridge setup. This was PCI, not PCIe, but it would be very much the same. You could even do it with completely hard-coded numbers if the hardware never changed, but in my case there were a couple variants so I actually had to do some simple enumeration to find the device numbers dynamically. One gotcha is that you may have to delay a bit, or retry, to give all the devices time to come online before you try to access them.
In doing that I found this book to be invaluable: PCI System Architecture (4th Edition). I notice there is also an version for PCIe: PCI Express System Architecture (1st Edition). I would definitely get one of those if you haven't already. These books contain detailed algorithms and explanations about how to do all of this. At the time I didn't really use or refer to any code to speak of, but...
The best code resource I have found is U-Boot. It operates at a similarly low-level and is totally self contained and is still fairly small and as simple as possible. For example, the enumeration appears to start with the function pci_init() calls a board specific pci_xxx_init(). This then sets up the root bridge and then calls pci_hose_scan_bus() in drivers/pci/pci.c to do the real work. Also check out the routines in drivers/pci/pci_auto.c, as well as the rest of the folder.
For your task you probably only need a very small subset and could just hack out parts of these files into a simple driver. Basically a for() loop and some pci_read/write_config() calls with logic to recognize your device and bridge IDs.

Why doesn't configfs support mmap?

I'm developing a linux kernel module for an embedded system.
The system contains programmable logic (PL), which needs to be accessed from userspace processes.
The PL can change at runtime.
My module allows processes to access specified hw registers and pages.
These mappings are configured (at runtime) in the configfs binding of my module.
Every mapping gets an entry in configfs over which its accessible.
I would like to allow processes to mmap whole pages, so they're able to communicate directly with the PL.
But configfs doesn't support mmap.
Is there a reason why?
Sysfs supports mmap, so I see no problem why configfs shouldn't.
A solution would be to mirror my configfs tree into sysfs,
but this defeats the whole reason to use configfs... Any ideas?
configfs is not a replacement for sysfs. In fact, it can be viewed as an opposite to sysfs.
sysfs provides a view of kernel's objects though the filesystem interface. It can be used to change things in or cause some actions on those objects, but it was not meant for that. The major point here is that each object that is represented in sysfs is created and destroyed in kernel. The kernel controls the lifecycle of the sysfs representation, and sysfs is just a window on all this.
configfs, on the other hand, provides a way to create or change kernel objects through the filesystem interface. That's a fundamental difference. A user-space process can create directories within configfs. That action will cause a callback execution within the kernel and the creation of the corresponding kernel object. The files in the directory would represent states of various object's components.
I suspect that due to the nature of data exchange between the kernel and a user space process in these two cases it was deemed unnecessary to have mmap support in configfs.
Without seeing the design/architecture of your system it's difficult to say something definitive in your case. From your description is appears that sysfs may be what you need to satisfy the desired goals. All objects that you need access to are created, modified, and destroyed from the kernel. Limited settings/changes to existing kernel structures/objects in your module can be done through sysfs interface. Then again, it may well be that you would want to have both sysfs and configfs interfaces in your module, each for its specific purpose. There's nothing bad in that if it makes things cleaner and clearer.

Packet filtering with Netfilter's NFQUEUE vs. Berkeley Packet Filter (BPF)

I've just read in these answers about two options for developing packet filters in linux.
The first is using iptables and netfilter, probably with NFQUEUE and libnetfilter_queue library.
The second is by using BPF (Berkeley Packet Filter), that seems in a quick reading to have similar capabilities for filtering purposes.
So, which of these alternatives is a better way to create a packet filter? What are the differences? My software is going to run as a gateway proxy, or "man-in-the-middle" that should receive a packet from one computer (with destination address to another one, not the filter's local address), and send it out after some filtering.
Thanks a lot!
Though my understanding is limited to the theoretical, I've done some reading while debugging the Kubernetes networking implementation and can thus take a stab at answering this.
Broadly, both netfilter and eBPF (the successor to BPF) implement a virtual machine that execute some logic while processing packets. netfilter's implementation appears to strive for compatibility with iptables previous implementation, being essentially a more performant successor to iptables.
However, there are still performance problems when using iptables -- particularly when there are large sets of iptables rules. The way eBPF is structured can alleviate some of these performance problems; specifically:
eBPF can be offloaded to a "smart nic"
eBPF can be structured to lookup rules more efficiently
Though it was initially used for network processing, eBPF is also being used for kernel instrumentation (sysdig, iovisor). It has a far larger set of use cases, but because of this, is likely a much tougher learning curve.
So, in summary:
Use what you're familiar with, unless you hit perf problems then
Look at eBPF
Relevant:
https://cilium.io/blog/2018/11/20/fb-bpf-firewall/
https://www.youtube.com/watch?v=4-pawkiazEg
https://ferrisellis.com/posts/ebpf_past_present_future/
https://lwn.net/Articles/747551/
Notes:
eBPF is the successor to cBPF, and has replaced it in the kernel
I refer to eBPF explicitly here out of habit

Resources