What is the "current" in Linux kernel source? - linux

I'm studying about Linux kernel and I have a problem.
I see many Linux kernel source files have current->files. So what is the current?
struct file *fget(unsigned int fd)
{
struct file *file;
struct files_struct *files = current->files;
rcu_read_lock();
file = fcheck_files(files, fd);
if (file) {
/* File object ref couldn't be taken */
if (file->f_mode & FMODE_PATH ||
!atomic_long_inc_not_zero(&file->f_count))
file = NULL;
}
rcu_read_unlock();
return file;
}

It's a pointer to the current process (i.e. the process that issued the system call).
On x86, it's defined in arch/x86/include/asm/current.h (similar files for other archs).
#ifndef _ASM_X86_CURRENT_H
#define _ASM_X86_CURRENT_H
#include <linux/compiler.h>
#include <asm/percpu.h>
#ifndef __ASSEMBLY__
struct task_struct;
DECLARE_PER_CPU(struct task_struct *, current_task);
static __always_inline struct task_struct *get_current(void)
{
return percpu_read_stable(current_task);
}
#define current get_current()
#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_CURRENT_H */
More information in Linux Device Drivers chapter 2:
The current pointer refers to the user process currently executing. During the execution of a system call, such as open or read, the current process is the one that invoked the call. Kernel code can use process-specific information by using current, if it needs to do so. [...]

Current is a global variable of type struct task_struct. You can find it's definition at [1].
Files is a struct files_struct and it contains information of the files used by the current process.
[1] http://students.mimuw.edu.pl/SO/LabLinux/PROCESY/ZRODLA/sched.h.html

this is ARM64 definition. in arch/arm64/include/asm/current.h, https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/current.h
struct task_struct;
/*
* We don't use read_sysreg() as we want the compiler to cache the value where
* possible.
*/
static __always_inline struct task_struct *get_current(void)
{
unsigned long sp_el0;
asm ("mrs %0, sp_el0" : "=r" (sp_el0));
return (struct task_struct *)sp_el0;
}
#define current get_current()
which just use the sp_el0 register. As the pointer to current process's task_struct

Related

System Call in a System Call

Hi I'm trying to add a custom system call to a lubuntu kernel.I'm trying to kill a process within this system call. I tried kill() system call in original ubuntu kernel. But i got compiler errors while doing that. I have no idea how to do that properly. Thanks in advance for your answers.
#define _POSIX_SOURCE
#include <linux/syscalls.h>
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/list.h>
#include <linux/signal.h>
#include <linux/types.h>
asmlinkage long sys_my_process_terminator (pid_t pid , int flag)
{
struct task_struct *task;
struct list_head *list;
struct list_head *siblist;
// firstly check the flag
struct task_struct *myprocess;
struct task_struct *sibchild;
myprocess = find_task_by_vpid(pid);
struct task_struct *pp;
pp =myprocess->parent;
if (flag == 0){
// this loop under this comment will kill all the children of the given process
list_for_each(list, &myprocess->children) {
task = list_entry(list, struct task_struct, sibling);
printk ("%s [%d] \n" , task->comm , task->pid);
kill(task->pid,SIGKILL);
}
}
else if (flag==1) {
list_for_each(list, &pp->children) {
task = list_entry(list, struct task_struct, sibling);
list_for_each(siblist, &task->children) {
sibchild = list_entry(siblist, struct task_struct, sibling);
printk ("%s [%d] \n" , sibchild->comm , sibchild->pid);
kill(sibchild->pid,SIGKILL);
}
if (task->pid !=pid){
printk ("%s [%d] \n" , task->comm , task->pid);
kill(task->pid,
}
You can not use the system call in a system call like that fork, kill, exit.
The kill() system call is commonly used to send to conventional process or multithreaded applications; its corresponding service routine is the sys_kill() function.
So you can use sys_kill() function on the kernel level in order to kill a process.
Usage of sys_kill() :
https://www.kernel.org/doc/htmldocs/device-drivers/API-sys-kill.html
and source book:
"Linux Kernel Development, 3rd Edition" by Robert Love (Publisher: Addison Wesley Professional, 2010)
Note : 2. Part'ı yapabildiniz mi ?? :)

"current" in Linux kernel code

As I was going through the below chunk of Linux char driver code, I found the structure pointer current in printk.
I want to know what structure the current is pointing to and its complete elements.
What purpose does this structure serve?
ssize_t sleepy_read (struct file *filp, char __user *buf, size_t count, loff_t *pos)
{
printk(KERN_DEBUG "process %i (%s) going to sleep\n",
current->pid, current->comm);
wait_event_interruptible(wq, flag != 0);
flag = 0;
printk(KERN_DEBUG "awoken %i (%s)\n", current->pid, current->comm);
return 0;
}
It is a pointer to the current process ie, the process which has issued the system call.
From the docs:
The Current Process
Although kernel modules don't execute sequentially as applications do,
most actions performed by the kernel are related to a specific
process. Kernel code can know the current process driving it by
accessing the global item current, a pointer to struct task_struct,
which as of version 2.4 of the kernel is declared in
<asm/current.h>, included by <linux/sched.h>. The current pointer
refers to the user process currently executing. During the execution
of a system call, such as open or read, the current process is the one
that invoked the call. Kernel code can use process-specific
information by using current, if it needs to do so. An example of this
technique is presented in "Access Control on a Device File", in
Chapter 5, "Enhanced Char Driver Operations".
Actually, current is not properly a global variable any more, like it
was in the first Linux kernels. The developers optimized access to the
structure describing the current process by hiding it in the stack
page. You can look at the details of current in <asm/current.h>. While
the code you'll look at might seem hairy, we must keep in mind that
Linux is an SMP-compliant system, and a global variable simply won't
work when you are dealing with multiple CPUs. The details of the
implementation remain hidden to other kernel subsystems though, and a
device driver can just include and refer to the
current process.
From a module's point of view, current is just like the external
reference printk. A module can refer to current wherever it sees fit.
For example, the following statement prints the process ID and the
command name of the current process by accessing certain fields in
struct task_struct:
printk("The process is \"%s\" (pid %i)\n",
current->comm, current->pid);
The command name stored in current->comm is the base name of the
program file that is being executed by the current process.
Here is the complete structure the "current" is pointing to
task_struct
Each task_struct data structure describes a process or task in the system.
struct task_struct {
/* these are hardcoded - don't touch */
volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
long counter;
long priority;
unsigned long signal;
unsigned long blocked; /* bitmap of masked signals */
unsigned long flags; /* per process flags, defined below */
int errno;
long debugreg[8]; /* Hardware debugging registers */
struct exec_domain *exec_domain;
/* various fields */
struct linux_binfmt *binfmt;
struct task_struct *next_task, *prev_task;
struct task_struct *next_run, *prev_run;
unsigned long saved_kernel_stack;
unsigned long kernel_stack_page;
int exit_code, exit_signal;
/* ??? */
unsigned long personality;
int dumpable:1;
int did_exec:1;
int pid;
int pgrp;
int tty_old_pgrp;
int session;
/* boolean value for session group leader */
int leader;
int groups[NGROUPS];
/*
* pointers to (original) parent process, youngest child, younger sibling,
* older sibling, respectively. (p->father can be replaced with
* p->p_pptr->pid)
*/
struct task_struct *p_opptr, *p_pptr, *p_cptr,
*p_ysptr, *p_osptr;
struct wait_queue *wait_chldexit;
unsigned short uid,euid,suid,fsuid;
unsigned short gid,egid,sgid,fsgid;
unsigned long timeout, policy, rt_priority;
unsigned long it_real_value, it_prof_value, it_virt_value;
unsigned long it_real_incr, it_prof_incr, it_virt_incr;
struct timer_list real_timer;
long utime, stime, cutime, cstime, start_time;
/* mm fault and swap info: this can arguably be seen as either
mm-specific or thread-specific */
unsigned long min_flt, maj_flt, nswap, cmin_flt, cmaj_flt, cnswap;
int swappable:1;
unsigned long swap_address;
unsigned long old_maj_flt; /* old value of maj_flt */
unsigned long dec_flt; /* page fault count of the last time */
unsigned long swap_cnt; /* number of pages to swap on next pass */
/* limits */
struct rlimit rlim[RLIM_NLIMITS];
unsigned short used_math;
char comm[16];
/* file system info */
int link_count;
struct tty_struct *tty; /* NULL if no tty */
/* ipc stuff */
struct sem_undo *semundo;
struct sem_queue *semsleeping;
/* ldt for this task - used by Wine. If NULL, default_ldt is used */
struct desc_struct *ldt;
/* tss for this task */
struct thread_struct tss;
/* filesystem information */
struct fs_struct *fs;
/* open file information */
struct files_struct *files;
/* memory management info */
struct mm_struct *mm;
/* signal handlers */
struct signal_struct *sig;
#ifdef __SMP__
int processor;
int last_processor;
int lock_depth; /* Lock depth.
We can context switch in and out
of holding a syscall kernel lock... */
#endif
};

how to get struct i2c_client *client structure inside the ioctl?

I am moving userspace sysfs interaction to the "/dev" using miscregister using ioctl method.
Can we resolve client structure(struct i2c_client) from Inode of please somebody tell how to get client structure inside ioctl. I need to do i2c transfer inside ioctl.
I referred this link :
http://stackoverflow.com/questions/2635038/inode-to-device-information
but coudln get any answer.
please someone give solution.
while you open your device in kernel using your open function. (this part of code is copied from one of the mainline drivers (drivers/i2c/i2c-dev.c) to make things easy for you)
my_i2c_device_open(struct inode *inode, struct file *file)
{
unsigned int minor = iminor(inode);
struct i2c_client *client;
struct i2c_adapter *adap;
struct i2c_dev *i2c_dev;
i2c_dev = i2c_dev_get_by_minor(minor);
if (!i2c_dev)
return -ENODEV;
adap = i2c_get_adapter(i2c_dev->adap->nr);
if (!adap)
return -ENODEV;
client = kzalloc(sizeof(*client), GFP_KERNEL);
if (!client) {
i2c_put_adapter(adap);
return -ENOMEM;
}
snprintf(client->name, I2C_NAME_SIZE, "i2c-dev %d", adap->nr);
client->adapter = adap;
file->private_data = client;
return 0;
}
when you call ioctl you can retrieve the i2c_client from the file pointer of your device:
static long my_i2c_device_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
struct i2c_client *client = file->private_data;
}
Hope this makes your life easy.'
This reference might help:
Reason to pass data using struct inode and struct file in Linux device driver programming
You build yourself a structure equivalent with "struct scull_dev" in the example above and you store there a reference to the i2c_client structure. In the IOCTL function you can retrieve later on the main control structure and the reference to the i2c_client through container_of.

Intercepting a system call [duplicate]

I'm trying to write some simple test code as a demonstration of hooking the system call table.
"sys_call_table" is no longer exported in 2.6, so I'm just grabbing the address from the System.map file, and I can see it is correct (Looking through the memory at the address I found, I can see the pointers to the system calls).
However, when I try to modify this table, the kernel gives an "Oops" with "unable to handle kernel paging request at virtual address c061e4f4" and the machine reboots.
This is CentOS 5.4 running 2.6.18-164.10.1.el5. Is there some sort of protection or do I just have a bug? I know it comes with SELinux, and I've tried putting it in to permissive mode, but it doesn't make a difference
Here's my code:
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/unistd.h>
void **sys_call_table;
asmlinkage int (*original_call) (const char*, int, int);
asmlinkage int our_sys_open(const char* file, int flags, int mode)
{
printk("A file was opened\n");
return original_call(file, flags, mode);
}
int init_module()
{
// sys_call_table address in System.map
sys_call_table = (void*)0xc061e4e0;
original_call = sys_call_table[__NR_open];
// Hook: Crashes here
sys_call_table[__NR_open] = our_sys_open;
}
void cleanup_module()
{
// Restore the original call
sys_call_table[__NR_open] = original_call;
}
I finally found the answer myself.
http://www.linuxforums.org/forum/linux-kernel/133982-cannot-modify-sys_call_table.html
The kernel was changed at some point so that the system call table is read only.
cypherpunk:
Even if it is late but the Solution
may interest others too: In the
entry.S file you will find: Code:
.section .rodata,"a"
#include "syscall_table_32.S"
sys_call_table -> ReadOnly You have to
compile the Kernel new if you want to
"hack" around with sys_call_table...
The link also has an example of changing the memory to be writable.
nasekomoe:
Hi everybody. Thanks for replies. I
solved the problem long ago by
modifying access to memory pages. I
have implemented two functions that do
it for my upper level code:
#include <asm/cacheflush.h>
#ifdef KERN_2_6_24
#include <asm/semaphore.h>
int set_page_rw(long unsigned int _addr)
{
struct page *pg;
pgprot_t prot;
pg = virt_to_page(_addr);
prot.pgprot = VM_READ | VM_WRITE;
return change_page_attr(pg, 1, prot);
}
int set_page_ro(long unsigned int _addr)
{
struct page *pg;
pgprot_t prot;
pg = virt_to_page(_addr);
prot.pgprot = VM_READ;
return change_page_attr(pg, 1, prot);
}
#else
#include <linux/semaphore.h>
int set_page_rw(long unsigned int _addr)
{
return set_memory_rw(_addr, 1);
}
int set_page_ro(long unsigned int _addr)
{
return set_memory_ro(_addr, 1);
}
#endif // KERN_2_6_24
Here's a modified version of the original code that works for me.
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/unistd.h>
#include <asm/semaphore.h>
#include <asm/cacheflush.h>
void **sys_call_table;
asmlinkage int (*original_call) (const char*, int, int);
asmlinkage int our_sys_open(const char* file, int flags, int mode)
{
printk("A file was opened\n");
return original_call(file, flags, mode);
}
int set_page_rw(long unsigned int _addr)
{
struct page *pg;
pgprot_t prot;
pg = virt_to_page(_addr);
prot.pgprot = VM_READ | VM_WRITE;
return change_page_attr(pg, 1, prot);
}
int init_module()
{
// sys_call_table address in System.map
sys_call_table = (void*)0xc061e4e0;
original_call = sys_call_table[__NR_open];
set_page_rw(sys_call_table);
sys_call_table[__NR_open] = our_sys_open;
}
void cleanup_module()
{
// Restore the original call
sys_call_table[__NR_open] = original_call;
}
Thanks Stephen, your research here was helpful to me. I had a few problems, though, as I was trying this on a 2.6.32 kernel, and getting WARNING: at arch/x86/mm/pageattr.c:877 change_page_attr_set_clr+0x343/0x530() (Not tainted) followed by a kernel OOPS about not being able to write to the memory address.
The comment above the mentioned line states:
// People should not be passing in unaligned addresses
The following modified code works:
int set_page_rw(long unsigned int _addr)
{
return set_memory_rw(PAGE_ALIGN(_addr) - PAGE_SIZE, 1);
}
int set_page_ro(long unsigned int _addr)
{
return set_memory_ro(PAGE_ALIGN(_addr) - PAGE_SIZE, 1);
}
Note that this still doesn't actually set the page as read/write in some situations. The static_protections() function, which is called inside of set_memory_rw(), removes the _PAGE_RW flag if:
It's in the BIOS area
The address is inside .rodata
CONFIG_DEBUG_RODATA is set and the kernel is set to read-only
I found this out after debugging why I still got "unable to handle kernel paging request" when trying to modify the address of kernel functions. I was eventually able to solve that problem by finding the page table entry for the address myself and manually setting it to writable. Thankfully, the lookup_address() function is exported in version 2.6.26+. Here is the code I wrote to do that:
void set_addr_rw(unsigned long addr) {
unsigned int level;
pte_t *pte = lookup_address(addr, &level);
if (pte->pte &~ _PAGE_RW) pte->pte |= _PAGE_RW;
}
void set_addr_ro(unsigned long addr) {
unsigned int level;
pte_t *pte = lookup_address(addr, &level);
pte->pte = pte->pte &~_PAGE_RW;
}
Finally, while Mark's answer is technically correct, it'll case problem when ran inside Xen. If you want to disable write-protect, use the read/write cr0 functions. I macro them like this:
#define GPF_DISABLE write_cr0(read_cr0() & (~ 0x10000))
#define GPF_ENABLE write_cr0(read_cr0() | 0x10000)
Hope this helps anyone else who stumbles upon this question.
Note that the following will also work instead of using change_page_attr and cannot be depreciated:
static void disable_page_protection(void) {
unsigned long value;
asm volatile("mov %%cr0,%0" : "=r" (value));
if (value & 0x00010000) {
value &= ~0x00010000;
asm volatile("mov %0,%%cr0": : "r" (value));
}
}
static void enable_page_protection(void) {
unsigned long value;
asm volatile("mov %%cr0,%0" : "=r" (value));
if (!(value & 0x00010000)) {
value |= 0x00010000;
asm volatile("mov %0,%%cr0": : "r" (value));
}
}
If you are dealing with kernel 3.4 and later (it can also work with earlier kernels, I didn't test it) I would recommend a smarter way to acquire the system callы table location.
For example
#include <linux/module.h>
#include <linux/kallsyms.h>
static unsigned long **p_sys_call_table;
/* Aquire system calls table address */
p_sys_call_table = (void *) kallsyms_lookup_name("sys_call_table");
That's it. No addresses, it works fine with every kernel I've tested.
The same way you can use a not exported Kernel function from your module:
static int (*ref_access_remote_vm)(struct mm_struct *mm, unsigned long addr,
void *buf, int len, int write);
ref_access_remote_vm = (void *)kallsyms_lookup_name("access_remote_vm");
Enjoy!
As others have hinted, the whole story is a bit different now on modern kernels. I'll be covering x86-64 here, for syscall hijacking on modern arm64 refer to this other answer of mine. Also NOTE: this is plain and simple syscall hijacking. Non-invasive hooking can be done in a much nicer way using kprobes.
Since Linux v4.17, x86 (both 64 and 32 bit) now uses syscall wrappers that take a struct pt_regs * as the only argument (see commit 1, commit 2). You can see arch/x86/include/asm/syscall.h for the definitions.
Additionally, as others have described already in different answers, the simplest way to modify sys_call_table is to temporarily disable CR0 WP (Write-Protect) bit, which could be done using read_cr0() and write_cr0(). However, since Linux v5.3, [native_]write_cr0 will check sensitive bits that should never change (like WP) and refuse to change them (commit). In order to work around this, we need to write CR0 manually using inline assembly.
Here is a working kernel module (tested on Linux 5.10 and 5.18) that does syscall hijacking on modern Linux x86-64 considering the above caveats and assuming that you already know the address of sys_call_table (if you also want to find that in the module, see Proper way of getting the address of non-exported kernel symbols in a Linux kernel module):
// SPDX-License-Identifier: (GPL-2.0 OR MIT)
/**
* Test syscall table hijacking on x86-64. This module will replace the `read`
* syscall with a simple wrapper which logs every invocation of `read` using
* printk().
*
* Tested on Linux x86-64 v5.10, v5.18.
*
* Usage:
*
* sudo cat /proc/kallsyms | grep sys_call_table # grab address
* sudo insmod syscall_hijack.ko sys_call_table_addr=0x<address_here>
*/
#include <linux/init.h> // module_{init,exit}()
#include <linux/module.h> // THIS_MODULE, MODULE_VERSION, ...
#include <linux/kernel.h> // printk(), pr_*()
#include <asm/special_insns.h> // {read,write}_cr0()
#include <asm/processor-flags.h> // X86_CR0_WP
#include <asm/unistd.h> // __NR_*
#ifdef pr_fmt
#undef pr_fmt
#endif
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
typedef long (*sys_call_ptr_t)(const struct pt_regs *);
static sys_call_ptr_t *real_sys_call_table;
static sys_call_ptr_t original_read;
static unsigned long sys_call_table_addr;
module_param(sys_call_table_addr, ulong, 0);
MODULE_PARM_DESC(sys_call_table_addr, "Address of sys_call_table");
// Since Linux v5.3 [native_]write_cr0 won't change "sensitive" CR0 bits, need
// to re-implement this ourselves.
static void write_cr0_unsafe(unsigned long val)
{
asm volatile("mov %0,%%cr0": "+r" (val) : : "memory");
}
static long myread(const struct pt_regs *regs)
{
pr_info("read(%ld, 0x%lx, %lx)\n", regs->di, regs->si, regs->dx);
return original_read(regs);
}
static int __init modinit(void)
{
unsigned long old_cr0;
real_sys_call_table = (typeof(real_sys_call_table))sys_call_table_addr;
pr_info("init\n");
// Temporarily disable CR0 WP to be able to write to read-only pages
old_cr0 = read_cr0();
write_cr0_unsafe(old_cr0 & ~(X86_CR0_WP));
// Overwrite syscall and save original to be restored later
original_read = real_sys_call_table[__NR_read];
real_sys_call_table[__NR_read] = myread;
// Restore CR0 WP
write_cr0_unsafe(old_cr0);
pr_info("init done\n");
return 0;
}
static void __exit modexit(void)
{
unsigned long old_cr0;
pr_info("exit\n");
old_cr0 = read_cr0();
write_cr0_unsafe(old_cr0 & ~(X86_CR0_WP));
// Restore original syscall
real_sys_call_table[__NR_read] = original_read;
write_cr0_unsafe(old_cr0);
pr_info("goodbye\n");
}
module_init(modinit);
module_exit(modexit);
MODULE_VERSION("0.1");
MODULE_DESCRIPTION("Test syscall table hijacking on x86-64.");
MODULE_AUTHOR("Marco Bonelli");
MODULE_LICENSE("Dual MIT/GPL");

Writing to eventfd from kernel module

I have created an eventfd instance in a userspace program using eventfd(). Is there a way in which I can pass some reference (a pointer to its struct or pid+fd pair) to this created instance of eventfd to a kernel module so that it can update the counter value?
Here is what I want to do:
I am developing a userspace program which needs to exchange data and signals with a kernel space module which I have written.
For transferring data, I am already using ioctl. But I want the kernel module to be able to signal the userspace program whenever new data is ready for it to consume over ioctl.
To do this, my userspace program will create a few eventfds in various threads. These threads will wait on these eventfds using select() and whenever the kernel module updates the counts on these eventfds, they will go on to consume the data by requesting for it over ioctl.
The problem is, how do I resolve the "struct file *" pointers to these eventfds from kernelspace? What kind of information bout the eventfds can I sent to kernel modules so that it can get the pointers to the eventfds? what functions would I use in the kernel module to get those pointers?
Is there better way to signal events to userspace from kernelspace?
I cannot let go of using select().
I finally figured out how to do this. I realized that each open file on a system could be identified by the pid of one of the processes which opened it and the fd corresponding to that file (within that process's context). So if my kernel module knows the pid and fd, it can look up the struct * task_struct of the process and from that the struct * files and finally using the fd, it can acquire the pointer to the eventfd's struct * file. Then, using this last pointer, it can write to the eventfd's counter.
Here are the codes for the userspace program and the kernel module that I wrote up to demonstrate the concept (which now work):
Userspace C code (efd_us.c):
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h> //Definition of uint64_t
#include <sys/eventfd.h>
int efd; //Eventfd file descriptor
uint64_t eftd_ctr;
int retval; //for select()
fd_set rfds; //for select()
int s;
int main() {
//Create eventfd
efd = eventfd(0,0);
if (efd == -1){
printf("\nUnable to create eventfd! Exiting...\n");
exit(EXIT_FAILURE);
}
printf("\nefd=%d pid=%d",efd,getpid());
//Watch efd
FD_ZERO(&rfds);
FD_SET(efd, &rfds);
printf("\nNow waiting on select()...");
fflush(stdout);
retval = select(efd+1, &rfds, NULL, NULL, NULL);
if (retval == -1){
printf("\nselect() error. Exiting...");
exit(EXIT_FAILURE);
} else if (retval > 0) {
printf("\nselect() says data is available now. Exiting...");
printf("\nreturned from select(), now executing read()...");
s = read(efd, &eftd_ctr, sizeof(uint64_t));
if (s != sizeof(uint64_t)){
printf("\neventfd read error. Exiting...");
} else {
printf("\nReturned from read(), value read = %lld",eftd_ctr);
}
} else if (retval == 0) {
printf("\nselect() says that no data was available");
}
printf("\nClosing eventfd. Exiting...");
close(efd);
printf("\n");
exit(EXIT_SUCCESS);
}
Kernel Module C code (efd_lkm.c):
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/pid.h>
#include <linux/sched.h>
#include <linux/fdtable.h>
#include <linux/rcupdate.h>
#include <linux/eventfd.h>
//Received from userspace. Process ID and eventfd's File descriptor are enough to uniquely identify an eventfd object.
int pid;
int efd;
//Resolved references...
struct task_struct * userspace_task = NULL; //...to userspace program's task struct
struct file * efd_file = NULL; //...to eventfd's file struct
struct eventfd_ctx * efd_ctx = NULL; //...and finally to eventfd context
//Increment Counter by 1
static uint64_t plus_one = 1;
int init_module(void) {
printk(KERN_ALERT "~~~Received from userspace: pid=%d efd=%d\n",pid,efd);
userspace_task = pid_task(find_vpid(pid), PIDTYPE_PID);
printk(KERN_ALERT "~~~Resolved pointer to the userspace program's task struct: %p\n",userspace_task);
printk(KERN_ALERT "~~~Resolved pointer to the userspace program's files struct: %p\n",userspace_task->files);
rcu_read_lock();
efd_file = fcheck_files(userspace_task->files, efd);
rcu_read_unlock();
printk(KERN_ALERT "~~~Resolved pointer to the userspace program's eventfd's file struct: %p\n",efd_file);
efd_ctx = eventfd_ctx_fileget(efd_file);
if (!efd_ctx) {
printk(KERN_ALERT "~~~eventfd_ctx_fileget() Jhol, Bye.\n");
return -1;
}
printk(KERN_ALERT "~~~Resolved pointer to the userspace program's eventfd's context: %p\n",efd_ctx);
eventfd_signal(efd_ctx, plus_one);
printk(KERN_ALERT "~~~Incremented userspace program's eventfd's counter by 1\n");
eventfd_ctx_put(efd_ctx);
return 0;
}
void cleanup_module(void) {
printk(KERN_ALERT "~~~Module Exiting...\n");
}
MODULE_LICENSE("GPL");
module_param(pid, int, 0);
module_param(efd, int, 0);
To run this, carry out the following steps:
Compile the userspace program (efd_us.out) and the kernel module (efd_lkm.ko)
Run the userspace program (./efd_us.out) and note the pid and efd values that it print. (for eg. "pid=2803 efd=3". The userspace program will wait endlessly on select()
Open a new terminal window and insert the kernel module passing the pid and efd as params: sudo insmod efd_lkm.ko pid=2803 efd=3
Switch back to the userspace program window and you will see that the userspace program has broken out of select and exited.
Consult the kernel source here:
http://lxr.free-electrons.com/source/fs/eventfd.c
Basically, send your userspace file descriptor, as produced by eventfd(), to your module via ioctl() or some other path. From the kernel, call eventfd_ctx_fdget() to get an eventfd context, then eventfd_signal() on the resulting context. Don't forget eventfd_ctx_put() when you're done with the context.
how do I resolve the "struct file *" pointers to these eventfds from kernelspace
You must resolve those pointers into data structures that this interface you've created has published (create new types and read the fields you want from struct file into it).
Is there better way to signal events to userspace from kernelspace?
Netlink sockets are another convenient way for the kernel to communicate with userspace. "Better" is in the eye of the beholder.

Resources