Debugging kernel hang because of IOCTL calls - linux

I am trying to make a kernel module which is working on 2.6.32 kernel to work on 3.6 kernel. We use IOCTL calls to update structures in Linux Kernel Module. These calls are working fine in 2.6.32 kernel.
When I try the same in 3.6 kernel I am facing kernel hang whenever ioctl calls are made from user-space application. Its a socket based interface not a file based interface hence we use the ioctl under struct proto_ops.
How can I debug this scenario as there is no core dump generated. To copy data from userspace I am using copy_from_user command.
Any pointers for debugging this scenario would be very helpful

ioctl() is one of the remaining parts of the kernel which runs under the Big Kernel Lock (BKL). In the past, the usage of the BKL has made it possible for long-running ioctl() methods to create long latencies for unrelated processes.
Follows an explanation of the patch that introduced unlocked_ioctl and compat_ioctl into 2.6.11. The removal of the ioctl field happened a lot later, in 2.6.36.
Explanation: When ioctl was executed, it took the Big Kernel Lock (BKL), so nothing else could execute at the same time. This is very bad on a multiprocessor machine, so there was a big effort to get rid of the BKL. First, unlocked_ioctl was introduced. It lets each driver writer choose what lock to use instead. This can be difficult, so there was a period of transition during which old drivers still worked (using ioctl) but new drivers could use the improved interface (unlocked_ioctl). Eventually all drivers were converted and ioctl could be removed.
compat_ioctl is actually unrelated, even though it was added at the same time. Its purpose is to allow 32-bit userland programs to make ioctl calls on a 64-bit kernel. The meaning of the last argument to ioctl depends on the driver, so there is no way to do a driver-independent conversion.
Reference: The new way of ioctl() by Jonathan Corbet

Related

ioctl vs kernel modules in Linux

I know that kernel modules are used to write device drivers. You can add new system calls to the Linux kernel and use it to communicate with other devices.
I also read that ioctl is a system call used in linux to implement system calls which are not available in the kernel by default.
My question is, why wouldn't you just write a new kernel module for your device instead of using ioctl? why would ioctl b useful where kernel modules exist?
You will need to write a kernel driver in either case, but you can choose between adding a new syscall and adding a ioctl.
Let's say you want to add a feature to get the tuner settings for a video capturing device.
If you implement it as a syscall:
You can't just load a module, you need to change the kernel itself
Hundreds of drivers could each add dozens of syscalls each, kludging up the table with thousands of global functions that must be kept forever.
For the driver to have any reach, you will need to convince kernel maintainers that this burden is worthwhile.
You will need to upstream the definition into glibc, and people must upgrade before they can write programs for it
If you implement it as an ioctl:
You can build your module for an existing kernel and let users load it, without having to get kernel maintainers involved
All functions are simple per-driver constants in the applicable header file, where they can easily be added or removed
Everyone can start programming with it just by including the header
Since an ioctl is much easier, more flexible, and exactly meant for all these driver specific function calls, this is generally the preferred method.
I also read that ioctl is a system call used in linux to implement system calls which are not available in the kernel by default.
This is incorrect.
System calls are (for Linux) listed in syscalls(2) (there are hundreds of them between user space and kernel land) and ioctl(2) is one of them. Read also wikipage on ioctl and on Unix philosophy and Linux Assembler HowTo
In practice, ioctl is mostly used on device files, and used for things which are not a read(2) or a write(2) of bytes.
For example, a sound is made by writing bytes to /dev/audio, but to change the volume you'll use some ioctl. See also fcntl(2) playing a similar role.
Input/output could also happen (somehow indirectly ...) thru mmap(2) and related virtual address space system calls.
For much more, read Advanced Linux Programming and Operating Systems: Three Easy Pieces. Look into Osdev for more hints about coding your own OS.
A kernel module could implement new devices, or new ioctl, etc... See kernelnewbies for more. I tend to believe it might sometimes add a few new syscalls (but this was false in older linux kernels like 3.x ones)
Linux is mostly open source. Please download then look inside source code. See also Linux From Scratch.
IIRC, Linux kernel 1.0 did not have any kernel modules. But that was around 1995.

Linux kernel: Find all drivers reachable via syscalls

I am comparing a mainline Linux kernel source with a modified copy of the same source that has many drivers added. A little background: That modified source is an Android kernel source, it contains many drivers added by the vendor, SoC manufacturer, Google etc.
I am trying to identify all drivers added in the modified source that are reachable from userspace via any syscalls. I'm looking for some systematic or ideally automatic way to find all these to avoid the manual work.
For example, char device drivers are of interest, since I could perform some openat, read, write, ioctl and close syscalls on them if there is a corresponding device file. To find new character device drivers, I could first find all new files in the source tree and then grep them for struct file_operations. But besides char drivers, what else is there that I need to look for?
I know that the syscalls mentioned above do some kind of "forwarding" to the respective device driver associated with the file. But are there other syscalls that do this kind of forwarding? I think I would have to focus on all these syscalls, right?
Is there something I can grep for in source files that indicates that syscalls can lead there? How should I go about this to find all these drivers?
Update (narrowing down):
I am targeting specific devices (e.g. Huawei P20 Lite), so I know relevant architecture and hardware. But for the sake of this question, we can just assume that hardware for whatever driver is present. It doesn't really matter in my case if I invoked a driver and it reported back that no corresponding hardware is present, as long as I can invoke the driver.
I only look for the drivers directly reachable via syscalls. By directly reachable I mean drivers designed to have some syscall interface with userspace. Yes, syscalls not aimed at a certain driver may still indirectly trigger code in that driver, but these indirect effects can be neglected.
Maybe some background on my objective clarifies: I want to fuzz-test the found drivers using Syzkaller. For this, I would create descriptions of the syscalls usable to fuzz each driver that Syzkaller parses.
I'm pretty sure there is no way to do this programmatically. Any attempt to do so would hit up against a couple of problems:
The drivers that are called in a given case depend on the hardware. For example, on my laptop, the iwlwifi driver will be reachable via network syscalls, but on a server that driver won't be used.
Virtually any code loaded into the kernel is reachable from some syscall if the hardware is present. Drivers interact with hardware, which in turn either interacts with users, external devices, or networks, and all of these operations are reachable by syscalls. People don't write drivers that don't do anything.
Even drivers that aren't directly reachable by a system call can affect execution. For example, a driver for a true RNG would be able to affect execution by changing the behavior of the system PRNG, even if it weren't accessible by /dev/hwrng.
So for a generic kernel that can run on any hardware of a given architecture, it's going to be pretty hard to exclude any driver from consideration. If your hope is to trace the execution of the code by some programmatic means without actually executing it, then you're going to need to solve the halting problem.
Sorry for the bad news.

Shutdown (embedded) linux from kernel-space

I'm working on a modified version of the 2.6.35 kernel for Olinuxino, an ARM9 based platform. I'm trying to modify the power management driver (the architecture specific part).
The processor is a Freescale i.MX23. This processor has a "special" pin, called PSWITCH, that triggers an interrupt that is handled by the power management driver.
If the switch is pressed,the system goes to standby. This is done in the driver by calling pm_suspend(PM_SUSPEND_STANDBY).
Given my hardware setup, I'd like to, instead, shutdown the system.
So my question is:
What is the preferred way for a kernel-space process to trigger a clean system halt/poweroff?
I suppose there's a nice little function call out there, but I couldn't find it so far.
My kernel code (the file I'm working on is arch/arm/mach-mx23/pm.c) can be found here: github.com/spairal/linux-for-lobster, though my question calls for a general Linux kernel approach.
The most general way would be for your driver to invoke shutdown as a userspace helper:
static const char * const shutdown_argv[] =
{ "/sbin/shutdown", "-h", "-P", "now", NULL };
call_usermodehelper(shutdown_argv[0], shutdown_argv, NULL, UMH_NO_WAIT);
(Presuming you have a /sbin/shutdown binary installed). This will shut userspace down cleanly, unmount filesystems and then request the kernel shutdown and power off.
However, you may be able to do better than this - for example if you can guarantee that there's no disk filesystems mounted read/write, you could tell a kernel thread to invoke the kernel_power_off() function (it shouldn't be done from interrupt context).

pinning a pthread to a single core

I am trying to measure the performance of some library calls. My primary measurement tool is the rdtsc call. After doing some reading I realize that I need to disable preemption and interrupts in order to get the most accurate readings. Can someone help me figure out how to do these? I know that pthreads have a 'set affinity' mechanism. Is that enough to get the job done?
I also read somewhere that I can make calls into the kernel of the sort
preempt_disable()
raw_local_irq_save(...)
Is there any benefit to using one approach over the other? I tried the latter approach and got this error.
error: 'preempt_disable' was not declared in this scope
which can be fixed by including linux/preempt.h but the compiler still complains.
linux/preempt.h: No such file or directory
Obviously I have not done any kernel hacking and I could not find this file on my system anywhere. I am really hoping I wont have to install a new linux kernel. :)
Thanks for your input.
Pinning a pthread to a single CPU can be done using pthread_setaffinity_np
But what you want to achieve at the end is not so simple. I'll explain you why.
preempt.h is part of the Linux Kernel source. Its located here. You need to have kernel sources with you. Anyways, you need to write a kernel module to access it, you cannot use it from user space. Learn how to write a kernel module here. Same is the case with functions preempt_disable and other interrupt disabling kernel functions
Now the point is, pthreads are in user space and your preemption disabling function is in kernel space. How to interact?
Either you need to write a new system call of your own where you do your preemption and interrupt disabling and call it from user space. Or you need to resort to other Kernel-User Space Interfaces like procfs, sysfs, ioctl etc
But I am really skeptical as to how all these will help you to benchmark library functions. You may want to have a look at how performance is typically measured using rdtsc

How early can I call kalloc in an arm linux kernel?

I would like to dynamically allocate memory from the machine_init function in my arm linux kernel. However, my tests indicate that calling kalloc sometimes results in a complete failure of the system to boot.
My debugging tools are very limited so I can't give much more information regarding the failure.
Simply put, is it legal to call kalloc from a machine_init function in ARM linux, and, if not, is there an alternative?
I understand that in most cases it is wrong-headed to be allocating memory this early in the boot process (this kind of work should be done by the device drivers); however, I am convinced that my particular project requires it.
I can't see where machine_init is called from, but I can't help thinking you're trying to do the wrong thing.
Device drivers and other subsystems have their own init time, trying to do things very early on is usually a mistake (because something required isn't started yet). You can definitely call kmalloc during the initialisation of a device driver (at least, most. Maybe the console driver is different).
In any case, the fact that your on ARM suggests that it's an embedded system, so you're unlikely to have to deal with a lot of different hardware. Can't you just statically allocate an array with as many elements as could possibly be required (give an error if it is exceeded) ?
Kmalloc is a kernel API on top slab/slob/slub memory frame work. Once any of these framework(one which used by kernel) is initialized kmalloc works fine. Make sure your call after the slab/slob/slub initialization
cheers

Resources