Writing to eventfd from kernel module - linux

I have created an eventfd instance in a userspace program using eventfd(). Is there a way in which I can pass some reference (a pointer to its struct or pid+fd pair) to this created instance of eventfd to a kernel module so that it can update the counter value?
Here is what I want to do:
I am developing a userspace program which needs to exchange data and signals with a kernel space module which I have written.
For transferring data, I am already using ioctl. But I want the kernel module to be able to signal the userspace program whenever new data is ready for it to consume over ioctl.
To do this, my userspace program will create a few eventfds in various threads. These threads will wait on these eventfds using select() and whenever the kernel module updates the counts on these eventfds, they will go on to consume the data by requesting for it over ioctl.
The problem is, how do I resolve the "struct file *" pointers to these eventfds from kernelspace? What kind of information bout the eventfds can I sent to kernel modules so that it can get the pointers to the eventfds? what functions would I use in the kernel module to get those pointers?
Is there better way to signal events to userspace from kernelspace?
I cannot let go of using select().

I finally figured out how to do this. I realized that each open file on a system could be identified by the pid of one of the processes which opened it and the fd corresponding to that file (within that process's context). So if my kernel module knows the pid and fd, it can look up the struct * task_struct of the process and from that the struct * files and finally using the fd, it can acquire the pointer to the eventfd's struct * file. Then, using this last pointer, it can write to the eventfd's counter.
Here are the codes for the userspace program and the kernel module that I wrote up to demonstrate the concept (which now work):
Userspace C code (efd_us.c):
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h> //Definition of uint64_t
#include <sys/eventfd.h>
int efd; //Eventfd file descriptor
uint64_t eftd_ctr;
int retval; //for select()
fd_set rfds; //for select()
int s;
int main() {
//Create eventfd
efd = eventfd(0,0);
if (efd == -1){
printf("\nUnable to create eventfd! Exiting...\n");
exit(EXIT_FAILURE);
}
printf("\nefd=%d pid=%d",efd,getpid());
//Watch efd
FD_ZERO(&rfds);
FD_SET(efd, &rfds);
printf("\nNow waiting on select()...");
fflush(stdout);
retval = select(efd+1, &rfds, NULL, NULL, NULL);
if (retval == -1){
printf("\nselect() error. Exiting...");
exit(EXIT_FAILURE);
} else if (retval > 0) {
printf("\nselect() says data is available now. Exiting...");
printf("\nreturned from select(), now executing read()...");
s = read(efd, &eftd_ctr, sizeof(uint64_t));
if (s != sizeof(uint64_t)){
printf("\neventfd read error. Exiting...");
} else {
printf("\nReturned from read(), value read = %lld",eftd_ctr);
}
} else if (retval == 0) {
printf("\nselect() says that no data was available");
}
printf("\nClosing eventfd. Exiting...");
close(efd);
printf("\n");
exit(EXIT_SUCCESS);
}
Kernel Module C code (efd_lkm.c):
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/pid.h>
#include <linux/sched.h>
#include <linux/fdtable.h>
#include <linux/rcupdate.h>
#include <linux/eventfd.h>
//Received from userspace. Process ID and eventfd's File descriptor are enough to uniquely identify an eventfd object.
int pid;
int efd;
//Resolved references...
struct task_struct * userspace_task = NULL; //...to userspace program's task struct
struct file * efd_file = NULL; //...to eventfd's file struct
struct eventfd_ctx * efd_ctx = NULL; //...and finally to eventfd context
//Increment Counter by 1
static uint64_t plus_one = 1;
int init_module(void) {
printk(KERN_ALERT "~~~Received from userspace: pid=%d efd=%d\n",pid,efd);
userspace_task = pid_task(find_vpid(pid), PIDTYPE_PID);
printk(KERN_ALERT "~~~Resolved pointer to the userspace program's task struct: %p\n",userspace_task);
printk(KERN_ALERT "~~~Resolved pointer to the userspace program's files struct: %p\n",userspace_task->files);
rcu_read_lock();
efd_file = fcheck_files(userspace_task->files, efd);
rcu_read_unlock();
printk(KERN_ALERT "~~~Resolved pointer to the userspace program's eventfd's file struct: %p\n",efd_file);
efd_ctx = eventfd_ctx_fileget(efd_file);
if (!efd_ctx) {
printk(KERN_ALERT "~~~eventfd_ctx_fileget() Jhol, Bye.\n");
return -1;
}
printk(KERN_ALERT "~~~Resolved pointer to the userspace program's eventfd's context: %p\n",efd_ctx);
eventfd_signal(efd_ctx, plus_one);
printk(KERN_ALERT "~~~Incremented userspace program's eventfd's counter by 1\n");
eventfd_ctx_put(efd_ctx);
return 0;
}
void cleanup_module(void) {
printk(KERN_ALERT "~~~Module Exiting...\n");
}
MODULE_LICENSE("GPL");
module_param(pid, int, 0);
module_param(efd, int, 0);
To run this, carry out the following steps:
Compile the userspace program (efd_us.out) and the kernel module (efd_lkm.ko)
Run the userspace program (./efd_us.out) and note the pid and efd values that it print. (for eg. "pid=2803 efd=3". The userspace program will wait endlessly on select()
Open a new terminal window and insert the kernel module passing the pid and efd as params: sudo insmod efd_lkm.ko pid=2803 efd=3
Switch back to the userspace program window and you will see that the userspace program has broken out of select and exited.

Consult the kernel source here:
http://lxr.free-electrons.com/source/fs/eventfd.c
Basically, send your userspace file descriptor, as produced by eventfd(), to your module via ioctl() or some other path. From the kernel, call eventfd_ctx_fdget() to get an eventfd context, then eventfd_signal() on the resulting context. Don't forget eventfd_ctx_put() when you're done with the context.

how do I resolve the "struct file *" pointers to these eventfds from kernelspace
You must resolve those pointers into data structures that this interface you've created has published (create new types and read the fields you want from struct file into it).
Is there better way to signal events to userspace from kernelspace?
Netlink sockets are another convenient way for the kernel to communicate with userspace. "Better" is in the eye of the beholder.

Related

Why EAGAIN in pthread_key_create happens?

Sometimes when I try to create key with pthread_key_create I'm getting EAGAIN error code. Is it possible to know exactly why?
Documentation says:
The system lacked the necessary resources to create another thread-specific data key, or the system-imposed limit on the total number of keys per process [PTHREAD_KEYS_MAX] would be exceeded.
How to check if it was a limit for keys? Maybe some king of monitor tool to check how many keys already opened in system and how many still could be used?
One important thing about our code: we use fork() and have multiple processes running. And each process could have multiple threads.
I found that we don't have independent limit for thread keys when we use fork(). Here is little example.
#include <stdio.h>
#include <pthread.h>
#include <string.h>
#include <unistd.h>
size_t create_keys(pthread_key_t *keys, size_t number_of_keys)
{
size_t counter = 0;
for (size_t i = 0; i < number_of_keys; i++)
{
int e = pthread_key_create(keys + i, NULL);
if (e)
{
printf("ERROR (%d): index: %ld, pthread_key_create (%d)\n", getpid(), i, e);
break;
}
counter++;
}
return counter;
}
int main(int argc, char const *argv[])
{
printf("maximim number of thread keys: %ld\n", sysconf(_SC_THREAD_KEYS_MAX));
printf("process id: %d\n", getpid());
const size_t number_of_keys = 1024;
pthread_key_t keys_1[number_of_keys];
memset(keys_1, 0, number_of_keys * sizeof(pthread_key_t));
printf("INFO (%d): number of active keys: %ld\n", getpid(), create_keys(keys_1, number_of_keys));
pid_t p = fork();
if (p == 0)
{
printf("process id: %d\n", getpid());
pthread_key_t keys_2[number_of_keys];
memset(keys_2, 0, number_of_keys * sizeof(pthread_key_t));
printf("INFO (%d): number of active keys: %ld\n", getpid(), create_keys(keys_2, number_of_keys));
}
return 0;
}
When I run this example on Ubuntu 16.04 I see that child process can not create any new thread key if I use same number of keys as limit (1024). But if I use 512 keys for parent and child processes I can run it without error.
As you know, fork() traditionally works by copying the process in memory and then continuing execution from the same point within each copy as parent and child. This is what the return code of fork() indicates.
In order to perform fork(), the internals of the process must be duplicated. Memory, stack, open files, and probably thread local storage keys. Each system is different in its implementation of fork(). Some systems allow you to customise the areas of the process that get copied (see Linux clone(2) interface). However, the concept remains the same.
So, on to your example code: if you allocate 1024 keys in the parent, every child process inherits a full key table and has no spare keys to work with, resulting in the errors. If you allocate only 512 keys in the parent, then every child inherits a half-empty keys table and has 512 spare keys to play with, hence no errors arise.
Maximum value:
#include <unistd.h>
#include <stdio.h>
int main ()
{
printf ("%ld\n", sysconf(_SC_THREAD_KEYS_MAX));
return 0;
}
Consider using pthread_key_delete.

Linux kernel driver: Finish 'completion' when device is removed

I am writing a kernel driver to send/receive data with a PCI Express device. For this first version of the driver I am creating a character device interface where the user can read data using a file.
Background
I want to implement a blocking read where the user requests data and the driver populates a user buffer. In order to block the user's read call, I am using a completion structure.
When the driver is loaded and the user requests a read the driver blocks as expected. If I were to finish the read then everything runs fine.
The problem
In order to be safe, whenever the module is removed I call the complete_all function, just in case someone removes the module or device in the middle of a read transaction.
Neither the remove or exit function is called and both the module and user application is blocked. I've tried the following three functions (shown with their associated result).
wait_completion(&dev->read_complete); //Blocks indefinitely, I need to reset the computer
retval = wait_for_completion_interruptible(&dev->read_complete); //I can kill the user application manually and then remove the driver
retval = wait_for_completion_killable(&dev->read_complete); //Same as interruptible
My expectation is that when the remove function is called I can call complete_all(&dev->read_complete) and the read function will return an error.
In order to remove external factors I've made a repo on github, so if anyone wants to see the behavior for themselves they just need to clone and follow the instructions:
Kernel Module Completion Test
The relevant parts of the module are here (/src/mymodule.c)
typedef struct {
struct cdev cdv;
struct class *cls;
struct device *dev;
struct completion complete;
} mymodule_t;
mymodule_t mymod;
//Sysfs 'mymodule_test' attribute (all but the actual function is left out for brevity)
static ssize_t mymodule_test_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count)
{
int retval = 0;
int value = 0;
if (sscanf(buf, "%d", &value) == 1)
{
retval = strlen(buf);
}
if (value)
{
printk("Value is: %d\n", value);
if (!completion_done(&mymod.complete))
{
complete(&mymod.complete);
}
printk("Sent Completion\n");
}
return retval;
}
//FOPS (all but 'read' function is left out for brevity)
ssize_t mymodule_read(struct file *filp, char * buf, size_t count, loff_t *f_pos)
{
printk("Read!\n");
if (completion_done(&mymod.complete))
{
reinit_completion(&mymod.complete);
}
printk("Wait for Completion\n");
wait_for_completion_interruptible(&mymod.complete);
printk("After Completion\n");
return 0;
}
static int __init mymodule_init(void)
{
...
//Register class and device
//Configure character driver with fops
init_completion(&mymod.complete);
...
}
static void __exit mymodule_exit(void)
{
...
if (!completion_done(&mymod.complete))
{
printk("Send a completion!\n");
complete(&mymod.complete);
}
//Clean up the rest of the module
...
}
module_init(mymodule_init);
module_exit(mymodule_exit);
Here is the userland application I use to exercise this:
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <termios.h>
#include "mymodule.h"
#define FILEPATH "/dev/mymodule0"
#define TEST_SIZE 10
int main(void)
{
int fn = -1;
char buf[TEST_SIZE];
printf("Attempting to open file module file...\n");
fn = open(FILEPATH, O_RDWR);
if (fn < 0)
{
printf("Failed to open file!\n");
return -1;
}
printf("Attempting to read from the file...\n");
read(fn, &buf, TEST_SIZE);
printf("Finished reading from file\n");
return 0;
}
Here is the dmesg output when I
load the module
run the user application (it opens the file, attempts to read 10 characters, then exits)
write '1' to the sysfs attribute
unload the module
[3217633.993937] Registering Driver
[3217633.993995] Driver Initialized!
[3217643.747791] Opened!
[3217643.747800] Read!
[3217643.747801] Wait for Completion
[3217646.436780] Value is: 1
[3217646.436792] Sent Completion
[3217646.436806] After Completion
[3217646.437010] Closed!
[3217727.378388] Cleanup Module
[3217727.378393] Check if we need to complete anything
[3217727.378395] Send a completion!
[3217727.378397] Unregistering Character Driver
[3217727.378400] Give back all the numbers we requested
[3217727.378402] Remove the class driver
[3217727.378571] Release the class
[3217727.378593] Finished Cleanup Module, Exiting
If I run the following commands:
load the module
run the user application
unload the module
[3218223.442777] Registering Driver
[3218223.442934] Driver Initialized!
[3218229.378396] Opened!
[3218229.378419] Read!
[3218229.378422] Wait for Completion
then the module doesn't unload. If this were a real device, like a USB hard drive, it is possible that the user could remove the device in the middle of a read transaction. It seems like something is wrong, or perhaps I'm missing something. Am I missing something?
While USB device can be removed at any time, its driver (e.g. kernel module) cannot be unloaded during certain operations with that device (e.g. reading). It is driver who reports to the upper level(e.g. filesystem) about absence of device.

How to use a function and pass its variables to the user app defined in the linux driver LM70?

Hi i would like to know how is it possible to call/run the following function from user space.
static ssize_t lm70_sense_temp(struct device *dev,
struct device_attribute *attr, char *buf)
{
//some code
.
.
status = sprintf(buf, "%d\n", val); /* millidegrees Celsius */
.
.
//some code
}
This function is defined in lm70.c driver located in the kernel/drivers/hwmon folder of the linux source? Is it possible to pass the values of this functions internal variables to the user application? I would like to retrieve the value of val variable in the above function...
I don't know well the kernel internals. However, I grepped for lm70_sense_temp in the entire kernel source tree, and it appears only in the file linux-3.7.1/drivers/hwmon/lm70.c, first as a static function, then as the argument to DEVICE_ATTR.
Then I googled for linux kernel DEVICE_ATTR and found immediately device.txt which shows that you probably should read that thru the sysfs, i.e. under /sys; read sysfs-rules.txt; so a user application could very probably read something relevant under /sys/
I'm downvoting your question because I feel that you could have searched a few minutes like I did (and I am not a kernel expert).
You don't need to call this function from user space to get that value - it is already exported to you via sysfs.
You could use grep to find which hwmon device it is:
grep -rl "lm70" /sys/class/hwmon/*/name /sys/class/hwmon/*/*/name
Then you can read the temperature input from your user space program, e.g:
#include <stdio.h>
#include <fcntl.h>
#define SENSOR_FILE "/sys/class/hwmon/hwmon0/temp1_input"
int readSensor(void)
{
int fd, val = -1;
char buf[32];
fd = open(SENSOR_FILE, O_RDONLY);
if (fd < 0) {
printf("Failed to open %s\n", SENSOR_FILE);
return val;
}
if (read(fd, &buf, sizeof(buf)) > 0) {
val = atoi(buf);
printf("Sensor value = %d\n", val);
} else {
printf("Failed to read %s\n", SENSOR_FILE);
}
close(fd);
return val;
}
As others have already stated - you can't call kernel code from user space, thems the breaks.
You cannot call a driver function directly from user space.
If that function is exported with EXPORT_SYMBOL or EXPORT_SYMBOL_GPL then we can write a simple kernel module and call that function directly. The result can be sent to user space through FIFO or shared memory.
But in your case, this function is not exported. so you should not do in this way.

What is the "current" in Linux kernel source?

I'm studying about Linux kernel and I have a problem.
I see many Linux kernel source files have current->files. So what is the current?
struct file *fget(unsigned int fd)
{
struct file *file;
struct files_struct *files = current->files;
rcu_read_lock();
file = fcheck_files(files, fd);
if (file) {
/* File object ref couldn't be taken */
if (file->f_mode & FMODE_PATH ||
!atomic_long_inc_not_zero(&file->f_count))
file = NULL;
}
rcu_read_unlock();
return file;
}
It's a pointer to the current process (i.e. the process that issued the system call).
On x86, it's defined in arch/x86/include/asm/current.h (similar files for other archs).
#ifndef _ASM_X86_CURRENT_H
#define _ASM_X86_CURRENT_H
#include <linux/compiler.h>
#include <asm/percpu.h>
#ifndef __ASSEMBLY__
struct task_struct;
DECLARE_PER_CPU(struct task_struct *, current_task);
static __always_inline struct task_struct *get_current(void)
{
return percpu_read_stable(current_task);
}
#define current get_current()
#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_CURRENT_H */
More information in Linux Device Drivers chapter 2:
The current pointer refers to the user process currently executing. During the execution of a system call, such as open or read, the current process is the one that invoked the call. Kernel code can use process-specific information by using current, if it needs to do so. [...]
Current is a global variable of type struct task_struct. You can find it's definition at [1].
Files is a struct files_struct and it contains information of the files used by the current process.
[1] http://students.mimuw.edu.pl/SO/LabLinux/PROCESY/ZRODLA/sched.h.html
this is ARM64 definition. in arch/arm64/include/asm/current.h, https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/current.h
struct task_struct;
/*
* We don't use read_sysreg() as we want the compiler to cache the value where
* possible.
*/
static __always_inline struct task_struct *get_current(void)
{
unsigned long sp_el0;
asm ("mrs %0, sp_el0" : "=r" (sp_el0));
return (struct task_struct *)sp_el0;
}
#define current get_current()
which just use the sp_el0 register. As the pointer to current process's task_struct

Closing a file descriptor that is being polled

If I have two threads (Linux, NPTL), and I have one thread that is polling on one or more of file descriptors, and another is closing one of them, is that a reasonable action? Am I doing something that I shouldn't be doing in MT environment?
The main reason I consider doing that, is that I don't necessarily want to communicate with the polling thread, interrupt it, etc., I instead would like to just close the descriptor for whatever reasons, and when the polling thread wakes up, I expect the revents to contain POLLNVAL, which would be the indication that the file descriptor should just be thrown away by the thread before the next poll.
I've put together a simple test, which does show that the POLLNVAL is exactly what's going to happen. However, in that case, POLLNVAL is only set when the timeout expires, closing the socket doesn't seem to make the poll() return. If that's the case, I can kill the thread to make poll() restart to return.
#define _GNU_SOURCE
#include <stdio.h>
#include <pthread.h>
#include <poll.h>
#include <errno.h>
#include <sys/types.h>
#include <unistd.h>
#include <string.h>
#include <signal.h>
static pthread_t main_thread;
void * close_some(void*a) {
printf("thread #2 (%d) is sleeping\n", getpid());
sleep(2);
close(0);
printf("socket closed\n");
// comment out the next line to not forcefully interrupt
pthread_kill(main_thread, SIGUSR1);
return 0;
}
void on_sig(int s) {
printf("signal recieved\n");
}
int main(int argc, char ** argv) {
pthread_t two;
struct pollfd pfd;
int rc;
struct sigaction act;
act.sa_handler = on_sig;
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
sigaction(SIGUSR1, &act, 0);
main_thread = pthread_self();
pthread_create(&two, 0, close_some, 0);
pfd.fd = 0;
pfd.events = POLLIN | POLLRDHUP;
printf("thread 0 (%d) polling\n", getpid());
rc = poll(&pfd, 1, 7000);
if (rc < 0) {
printf("error : %s\n", strerror(errno));
} else if (!rc) {
printf("time out!\n");
} else {
printf("revents = %x\n", pfd.revents);
}
return 0;
}
For Linux at least, this seems risky. The manual page for close warns:
It is probably unwise to close file descriptors while they may be in
use by system calls in other threads in the same process. Since a
file descriptor may be reused, there are some obscure race conditions
that may cause unintended side effects.
Since you're on Linux, you could do the following:
Set up an eventfd and add it to the poll
Signal the eventfd (write to it) when you want to close a fd
In the poll, when you see activity on the eventfd you can immediately close a fd and remove it from poll
Alternatively you could simply establish a signal handler and check for errno == EINTR when poll returns. The signal handler would only need to set some global variable to the value of the fd you're closing.
Since you're on Linux you might want to consider epoll as a superior albeit non-standard alternative to poll.

Resources