Calling aarch64 shared library from amd64 executable, maybe using binary translation/QEMU - linux

I have an aarch64 library for Linux and I want to use it from within an amd64 Linux install. Currently, I know one method of getting this to work, which is to use the qemu-arm-static binary emulator with an aarch64 executable I compile myself, that calls dlopen on the aarch64 library and uses it.
The annoyance is that integrating the aarch64 executable with my amd64 environment is annoying (eg. let's say, for example, this arm64 library is from an IoT device and decodes a special video file in real time—how am I supposed to use the native libraries on my computer to play it?). I end up using UNIX pipes, but I really dislike this solution.
Is there a way I can use the qemu-arm-static stuff only with the library, so I can have an amd64 executable that directly calls the library? If not, what's the best way to interface between the two architectures? Is it pipes?

The solution that I implemented for this is to use shared memory IPC. This solution is particularly nice since it integrates pretty well with fixed-length C structs, allowing you to simply just use a struct on one end and the other end.
Let's say you have a function with a signature uint32_t so_lib_function_a(uint32_t c[2])
You can write a wrapper function in an amd64 library: uint32_t wrapped_so_lib_function_a(uint32_t c[2]).
Then, you create a shared memory structure:
typedef struct {
uint32_t c[2];
uint32_t ret;
int turn; // turn = 0 means amd64 library, turn = 1 means arm library
} ipc_call_struct;
Initialise a struct like this, and then run shmget(SOME_SHM_KEY, sizeof(ipc_call_struct), IPC_CREAT | 0777);, get the return value from that, and then get a pointer to the shared memory. Then copy the initialised struct into shared memory.
You then run shmget(3) and shmat(3) on the ARM binary side, getting a pointer to the shared memory as well. The ARM binary runs an infinite loop, waiting for its "turn." When turn is set to 1, the amd64 binary will block in a forever loop until the turn is 0. The ARM binary will execute the function, using the shared struct details as parameters and updating the shared memory struct with the return value. Then the ARM library will set the turn to 0 and block until turn is 1 again, which will allow the amd64 binary to do its thing until it's ready to call the ARM function again.
Here is an example (it might not compile yet, but it gives you a general idea):
Our "unknown" library : shared.h
#include <stdint.h>
#define MAGIC_NUMBER 0x44E
uint32_t so_lib_function_a(uint32_t c[2]) {
// Add args and multiplies by MAGIC_NUMBER
uint32_t ret;
for (int i = 0; i < 2; i++) {
ret += c[i];
}
ret *= MAGIC_NUMBER;
return ret;
}
Hooking into the "unknown" library: shared_executor.c
#include <dlfcn.h>
#include <sys/shm.h>
#include <stdint.h>
#define SHM_KEY 22828 // Some random SHM ID
uint32_t (*so_lib_function_a)(uint32_t c[2]);
typedef struct {
uint32_t c[2];
uint32_t ret;
int turn; // turn = 0 means amd64 library, turn = 1 means arm library
} ipc_call_struct;
int main() {
ipc_call_struct *handle;
void *lib_dlopen = dlopen("./shared.so", RTLD_LAZY);
so_lib_function_a = dlsym(lib_dlopen, "so_lib_function_a");
// setup shm
int shm_id = shmget(SHM_KEY, sizeof(ipc_call_struct), IPC_CREAT | 0777);
handle = shmat(shm_id, NULL, 0);
// We expect the handle to already be initialised by the time we get here, so we don't have to do anything
while (true) {
if (handle->turn == 1) { // our turn
handle->ret = so_lib_function_a(handle->c);
handle->turn = 0; // hand off for later
}
}
}
On the amd64 side: shm_shared.h
#include <stdint.h>
#include <sys/shm.h>
typedef struct {
uint32_t c[2];
uint32_t ret;
int turn; // turn = 0 means amd64 library, turn = 1 means arm library
} ipc_call_struct;
#define SHM_KEY 22828 // Some random SHM ID
static ipc_call_struct* handle;
void wrapper_init() {
// setup shm here
int shm_id = shmget(SHM_KEY, sizeof(ipc_call_struct), IPC_CREAT | 0777);
handle = shmat(shm_id, NULL, 0);
// Initialise the handle
// Currently, we don't want to call the ARM library, so the turn is still zero
ipc_call_struct temp_handle = { .c={0}, .ret=0, .turn=0 };
*handle = temp_handle;
// you should be able to fork the ARM binary using "qemu-arm-static" here
// (and add code for that if you'd like)
}
uint32_t wrapped_so_lib_function_a(uint32_t c[2]) {
handle->c = c;
handle->turn = 1; // hand off execution to the ARM librar
while (handle->turn != 0) {} // wait
return handle->ret;
}
Again, there's no guarantee this code even compiles (yet), but just a general idea.

Related

Is it possible to update the flags of an existing memory mapping?

I am writing a small program in x64 assembly that will spawn children, all sharing their memory mappings so they can modify each other's code. For this, since the argument CLONE_VM of sys_clone seems to place the program into undefined behaviour, I plan to use mmap's MAP_SHARED argument.
However, I would also need the children to modify the code of the father. One option is to also allocate a MAP_SHARED mapping and give it to the father, but I'd like to avoid this if possible (only for elegance reasons).
Since the base mapping (the 0x400000 one on 64-bits Linux) of the program will not have the MAP_SHARED flag by default, is it possible to update it using a syscall to set this flag? munmap then mmap will not do and cause a SIGSEGV, and mprotect can only change the RWX permissions.
You can't change whether an existing mapping is private or shared, but you can map a new shared mapping over an existing private mapping. You can even do so in C, like this:
#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
int main(void) {
FILE *stream = fopen("/proc/self/maps", "rb");
if(!stream) {
perror("fopen");
return 1;
}
char *text_start, *text_end;
do {
if(fscanf(stream, " %p-%p%*[^\n]", &text_start, &text_end) != 2) {
perror("scanf");
return 1;
}
} while(!(text_start <= main && main < text_end));
if(fclose(stream)) {
perror("fclose");
return 1;
}
size_t text_len = text_end - text_start;
char *mem = mmap(NULL, text_len, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED|MAP_ANONYMOUS, -1, 0);
if(mem == MAP_FAILED) {
perror("mmap");
return 1;
}
memcpy(mem, text_start, text_len);
__builtin___clear_cache(mem, mem + text_len);
if(mremap(mem, text_len, text_len, MREMAP_MAYMOVE|MREMAP_FIXED, text_start) == MAP_FAILED) {
perror("mremap");
return 1;
}
/* you can check /proc/PID/maps now to see the new mapping */
getchar();
}
As a bonus, this program supports ASLR and doesn't require that the text section start at 0x400000.

Finding out which threads own a pthread_rwlock_t

How can I see (on linux) which threads own a pthread_rwlock_t (or std::shared_mutex) ?
For a regular mutex there's Is it possible to determine the thread holding a mutex? but how to do this for a r/w lock?
Your good question has a few problems with a complete answer:
It is dependent upon which OS kernel you are running, and possibly even which version.
It is dependent upon which libc/libpthread you are using, and possibly even which compiler.
Presuming you can untangle a specific configuration above, we are looking at two different outcomes: there is a single writer, or a set of readers who currently cause pthread_rw_lock() to block for some callers.
In contrast, a mutex has only ever one owner and only that owner can release it.
So a first test in finding out whether your system has a findable owner set is to see if it actually records the ownership. A quick hack to this would be have one thread acquire the rwlock for read or write, then signal a second one to release it. If the release fails, your implementation properly records it, and you have a chance; if not, your implementation likely implements rwlocks with mutex + condvar + counter; so it has no idea of ownership.
So, some code:
#include <stdio.h>
#include <pthread.h>
pthread_mutex_t lock;
pthread_cond_t cv;
pthread_rwlock_t rwl;
int ready;
void *rlse(void *_)
{
int *t = _;
pthread_mutex_lock(&lock);
while (ready == 0) {
pthread_cond_wait(&cv, &lock);
}
ready = 0;
pthread_cond_signal(&cv);
pthread_mutex_unlock(&lock);
*t = pthread_rwlock_unlock(&rwl);
return 0;
}
void *get(void *_)
{
int *t = _;
*t = (*t) ? pthread_rwlock_wrlock(&rwl) : pthread_rwlock_rdlock(&rwl);
if (*t == 0) {
pthread_mutex_lock(&lock);
ready = 1;
pthread_cond_signal(&cv);
while (ready == 1) {
pthread_cond_wait(&cv, &lock);
}
pthread_mutex_unlock(&lock);
}
return 0;
}
int main()
{
pthread_t acq, rel;
int v0, v1;
int i;
for (i = 0; i < 2; i++) {
pthread_rwlock_init(&rwl, 0);
pthread_mutex_init(&lock, 0);
pthread_cond_init(&cv, 0);
v0 = i;
pthread_create(&acq, 0, get, &v0);
pthread_create(&rel, 0, rlse, &v1);
pthread_join(acq, 0);
pthread_join(rel, 0);
printf("%s: %d %d\n", i ? "write" : "read", v0, v1);
}
return 0;
}
which we run as:
u18:src $ cc rw.c -lpthread -o rw
u18:src $ ./rw
read: 0 0
write: 0 0
This is telling us that in either case (rdlock, wrlock), a thread different from the calling thread can release the rwlock, thus it fundamentally has no owner.
With a little less sense of discovery, we could have found out a bit by reading the manpage for pthread_rwlock_unlock, which states that this condition is undefined behavior, which is the great cop-out.
Posix establishes a base, not a limit, so it is possible your implementation can support this sort of ownership. I program like the one above is a good investigative tool; if it turns up something like ENOTOWNER; but EINVAL would be pretty non-committal.
The innards of glic's rwlock (sysdeps/htl/bits/types/struct___pthread_rwlock.h):
struct __pthread_rwlock
{
__pthread_spinlock_t __held;
__pthread_spinlock_t __lock;
int __readers;
struct __pthread *__readerqueue;
struct __pthread *__writerqueue;
struct __pthread_rwlockattr *__attr;
void *__data;
};
confirms our suspicion; at first I was hopeful, with the queues and all, but a little diving through the code revealed them to be the waiting lists, not the owner lists.
The above was run on ubuntu 18.04; linux 4.15.0-112-generic; gcc 7.5.0; glibc 2.27 libpthread 2.27.

Intercepting a system call [duplicate]

I'm trying to write some simple test code as a demonstration of hooking the system call table.
"sys_call_table" is no longer exported in 2.6, so I'm just grabbing the address from the System.map file, and I can see it is correct (Looking through the memory at the address I found, I can see the pointers to the system calls).
However, when I try to modify this table, the kernel gives an "Oops" with "unable to handle kernel paging request at virtual address c061e4f4" and the machine reboots.
This is CentOS 5.4 running 2.6.18-164.10.1.el5. Is there some sort of protection or do I just have a bug? I know it comes with SELinux, and I've tried putting it in to permissive mode, but it doesn't make a difference
Here's my code:
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/unistd.h>
void **sys_call_table;
asmlinkage int (*original_call) (const char*, int, int);
asmlinkage int our_sys_open(const char* file, int flags, int mode)
{
printk("A file was opened\n");
return original_call(file, flags, mode);
}
int init_module()
{
// sys_call_table address in System.map
sys_call_table = (void*)0xc061e4e0;
original_call = sys_call_table[__NR_open];
// Hook: Crashes here
sys_call_table[__NR_open] = our_sys_open;
}
void cleanup_module()
{
// Restore the original call
sys_call_table[__NR_open] = original_call;
}
I finally found the answer myself.
http://www.linuxforums.org/forum/linux-kernel/133982-cannot-modify-sys_call_table.html
The kernel was changed at some point so that the system call table is read only.
cypherpunk:
Even if it is late but the Solution
may interest others too: In the
entry.S file you will find: Code:
.section .rodata,"a"
#include "syscall_table_32.S"
sys_call_table -> ReadOnly You have to
compile the Kernel new if you want to
"hack" around with sys_call_table...
The link also has an example of changing the memory to be writable.
nasekomoe:
Hi everybody. Thanks for replies. I
solved the problem long ago by
modifying access to memory pages. I
have implemented two functions that do
it for my upper level code:
#include <asm/cacheflush.h>
#ifdef KERN_2_6_24
#include <asm/semaphore.h>
int set_page_rw(long unsigned int _addr)
{
struct page *pg;
pgprot_t prot;
pg = virt_to_page(_addr);
prot.pgprot = VM_READ | VM_WRITE;
return change_page_attr(pg, 1, prot);
}
int set_page_ro(long unsigned int _addr)
{
struct page *pg;
pgprot_t prot;
pg = virt_to_page(_addr);
prot.pgprot = VM_READ;
return change_page_attr(pg, 1, prot);
}
#else
#include <linux/semaphore.h>
int set_page_rw(long unsigned int _addr)
{
return set_memory_rw(_addr, 1);
}
int set_page_ro(long unsigned int _addr)
{
return set_memory_ro(_addr, 1);
}
#endif // KERN_2_6_24
Here's a modified version of the original code that works for me.
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/unistd.h>
#include <asm/semaphore.h>
#include <asm/cacheflush.h>
void **sys_call_table;
asmlinkage int (*original_call) (const char*, int, int);
asmlinkage int our_sys_open(const char* file, int flags, int mode)
{
printk("A file was opened\n");
return original_call(file, flags, mode);
}
int set_page_rw(long unsigned int _addr)
{
struct page *pg;
pgprot_t prot;
pg = virt_to_page(_addr);
prot.pgprot = VM_READ | VM_WRITE;
return change_page_attr(pg, 1, prot);
}
int init_module()
{
// sys_call_table address in System.map
sys_call_table = (void*)0xc061e4e0;
original_call = sys_call_table[__NR_open];
set_page_rw(sys_call_table);
sys_call_table[__NR_open] = our_sys_open;
}
void cleanup_module()
{
// Restore the original call
sys_call_table[__NR_open] = original_call;
}
Thanks Stephen, your research here was helpful to me. I had a few problems, though, as I was trying this on a 2.6.32 kernel, and getting WARNING: at arch/x86/mm/pageattr.c:877 change_page_attr_set_clr+0x343/0x530() (Not tainted) followed by a kernel OOPS about not being able to write to the memory address.
The comment above the mentioned line states:
// People should not be passing in unaligned addresses
The following modified code works:
int set_page_rw(long unsigned int _addr)
{
return set_memory_rw(PAGE_ALIGN(_addr) - PAGE_SIZE, 1);
}
int set_page_ro(long unsigned int _addr)
{
return set_memory_ro(PAGE_ALIGN(_addr) - PAGE_SIZE, 1);
}
Note that this still doesn't actually set the page as read/write in some situations. The static_protections() function, which is called inside of set_memory_rw(), removes the _PAGE_RW flag if:
It's in the BIOS area
The address is inside .rodata
CONFIG_DEBUG_RODATA is set and the kernel is set to read-only
I found this out after debugging why I still got "unable to handle kernel paging request" when trying to modify the address of kernel functions. I was eventually able to solve that problem by finding the page table entry for the address myself and manually setting it to writable. Thankfully, the lookup_address() function is exported in version 2.6.26+. Here is the code I wrote to do that:
void set_addr_rw(unsigned long addr) {
unsigned int level;
pte_t *pte = lookup_address(addr, &level);
if (pte->pte &~ _PAGE_RW) pte->pte |= _PAGE_RW;
}
void set_addr_ro(unsigned long addr) {
unsigned int level;
pte_t *pte = lookup_address(addr, &level);
pte->pte = pte->pte &~_PAGE_RW;
}
Finally, while Mark's answer is technically correct, it'll case problem when ran inside Xen. If you want to disable write-protect, use the read/write cr0 functions. I macro them like this:
#define GPF_DISABLE write_cr0(read_cr0() & (~ 0x10000))
#define GPF_ENABLE write_cr0(read_cr0() | 0x10000)
Hope this helps anyone else who stumbles upon this question.
Note that the following will also work instead of using change_page_attr and cannot be depreciated:
static void disable_page_protection(void) {
unsigned long value;
asm volatile("mov %%cr0,%0" : "=r" (value));
if (value & 0x00010000) {
value &= ~0x00010000;
asm volatile("mov %0,%%cr0": : "r" (value));
}
}
static void enable_page_protection(void) {
unsigned long value;
asm volatile("mov %%cr0,%0" : "=r" (value));
if (!(value & 0x00010000)) {
value |= 0x00010000;
asm volatile("mov %0,%%cr0": : "r" (value));
}
}
If you are dealing with kernel 3.4 and later (it can also work with earlier kernels, I didn't test it) I would recommend a smarter way to acquire the system callы table location.
For example
#include <linux/module.h>
#include <linux/kallsyms.h>
static unsigned long **p_sys_call_table;
/* Aquire system calls table address */
p_sys_call_table = (void *) kallsyms_lookup_name("sys_call_table");
That's it. No addresses, it works fine with every kernel I've tested.
The same way you can use a not exported Kernel function from your module:
static int (*ref_access_remote_vm)(struct mm_struct *mm, unsigned long addr,
void *buf, int len, int write);
ref_access_remote_vm = (void *)kallsyms_lookup_name("access_remote_vm");
Enjoy!
As others have hinted, the whole story is a bit different now on modern kernels. I'll be covering x86-64 here, for syscall hijacking on modern arm64 refer to this other answer of mine. Also NOTE: this is plain and simple syscall hijacking. Non-invasive hooking can be done in a much nicer way using kprobes.
Since Linux v4.17, x86 (both 64 and 32 bit) now uses syscall wrappers that take a struct pt_regs * as the only argument (see commit 1, commit 2). You can see arch/x86/include/asm/syscall.h for the definitions.
Additionally, as others have described already in different answers, the simplest way to modify sys_call_table is to temporarily disable CR0 WP (Write-Protect) bit, which could be done using read_cr0() and write_cr0(). However, since Linux v5.3, [native_]write_cr0 will check sensitive bits that should never change (like WP) and refuse to change them (commit). In order to work around this, we need to write CR0 manually using inline assembly.
Here is a working kernel module (tested on Linux 5.10 and 5.18) that does syscall hijacking on modern Linux x86-64 considering the above caveats and assuming that you already know the address of sys_call_table (if you also want to find that in the module, see Proper way of getting the address of non-exported kernel symbols in a Linux kernel module):
// SPDX-License-Identifier: (GPL-2.0 OR MIT)
/**
* Test syscall table hijacking on x86-64. This module will replace the `read`
* syscall with a simple wrapper which logs every invocation of `read` using
* printk().
*
* Tested on Linux x86-64 v5.10, v5.18.
*
* Usage:
*
* sudo cat /proc/kallsyms | grep sys_call_table # grab address
* sudo insmod syscall_hijack.ko sys_call_table_addr=0x<address_here>
*/
#include <linux/init.h> // module_{init,exit}()
#include <linux/module.h> // THIS_MODULE, MODULE_VERSION, ...
#include <linux/kernel.h> // printk(), pr_*()
#include <asm/special_insns.h> // {read,write}_cr0()
#include <asm/processor-flags.h> // X86_CR0_WP
#include <asm/unistd.h> // __NR_*
#ifdef pr_fmt
#undef pr_fmt
#endif
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
typedef long (*sys_call_ptr_t)(const struct pt_regs *);
static sys_call_ptr_t *real_sys_call_table;
static sys_call_ptr_t original_read;
static unsigned long sys_call_table_addr;
module_param(sys_call_table_addr, ulong, 0);
MODULE_PARM_DESC(sys_call_table_addr, "Address of sys_call_table");
// Since Linux v5.3 [native_]write_cr0 won't change "sensitive" CR0 bits, need
// to re-implement this ourselves.
static void write_cr0_unsafe(unsigned long val)
{
asm volatile("mov %0,%%cr0": "+r" (val) : : "memory");
}
static long myread(const struct pt_regs *regs)
{
pr_info("read(%ld, 0x%lx, %lx)\n", regs->di, regs->si, regs->dx);
return original_read(regs);
}
static int __init modinit(void)
{
unsigned long old_cr0;
real_sys_call_table = (typeof(real_sys_call_table))sys_call_table_addr;
pr_info("init\n");
// Temporarily disable CR0 WP to be able to write to read-only pages
old_cr0 = read_cr0();
write_cr0_unsafe(old_cr0 & ~(X86_CR0_WP));
// Overwrite syscall and save original to be restored later
original_read = real_sys_call_table[__NR_read];
real_sys_call_table[__NR_read] = myread;
// Restore CR0 WP
write_cr0_unsafe(old_cr0);
pr_info("init done\n");
return 0;
}
static void __exit modexit(void)
{
unsigned long old_cr0;
pr_info("exit\n");
old_cr0 = read_cr0();
write_cr0_unsafe(old_cr0 & ~(X86_CR0_WP));
// Restore original syscall
real_sys_call_table[__NR_read] = original_read;
write_cr0_unsafe(old_cr0);
pr_info("goodbye\n");
}
module_init(modinit);
module_exit(modexit);
MODULE_VERSION("0.1");
MODULE_DESCRIPTION("Test syscall table hijacking on x86-64.");
MODULE_AUTHOR("Marco Bonelli");
MODULE_LICENSE("Dual MIT/GPL");

Writing to eventfd from kernel module

I have created an eventfd instance in a userspace program using eventfd(). Is there a way in which I can pass some reference (a pointer to its struct or pid+fd pair) to this created instance of eventfd to a kernel module so that it can update the counter value?
Here is what I want to do:
I am developing a userspace program which needs to exchange data and signals with a kernel space module which I have written.
For transferring data, I am already using ioctl. But I want the kernel module to be able to signal the userspace program whenever new data is ready for it to consume over ioctl.
To do this, my userspace program will create a few eventfds in various threads. These threads will wait on these eventfds using select() and whenever the kernel module updates the counts on these eventfds, they will go on to consume the data by requesting for it over ioctl.
The problem is, how do I resolve the "struct file *" pointers to these eventfds from kernelspace? What kind of information bout the eventfds can I sent to kernel modules so that it can get the pointers to the eventfds? what functions would I use in the kernel module to get those pointers?
Is there better way to signal events to userspace from kernelspace?
I cannot let go of using select().
I finally figured out how to do this. I realized that each open file on a system could be identified by the pid of one of the processes which opened it and the fd corresponding to that file (within that process's context). So if my kernel module knows the pid and fd, it can look up the struct * task_struct of the process and from that the struct * files and finally using the fd, it can acquire the pointer to the eventfd's struct * file. Then, using this last pointer, it can write to the eventfd's counter.
Here are the codes for the userspace program and the kernel module that I wrote up to demonstrate the concept (which now work):
Userspace C code (efd_us.c):
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h> //Definition of uint64_t
#include <sys/eventfd.h>
int efd; //Eventfd file descriptor
uint64_t eftd_ctr;
int retval; //for select()
fd_set rfds; //for select()
int s;
int main() {
//Create eventfd
efd = eventfd(0,0);
if (efd == -1){
printf("\nUnable to create eventfd! Exiting...\n");
exit(EXIT_FAILURE);
}
printf("\nefd=%d pid=%d",efd,getpid());
//Watch efd
FD_ZERO(&rfds);
FD_SET(efd, &rfds);
printf("\nNow waiting on select()...");
fflush(stdout);
retval = select(efd+1, &rfds, NULL, NULL, NULL);
if (retval == -1){
printf("\nselect() error. Exiting...");
exit(EXIT_FAILURE);
} else if (retval > 0) {
printf("\nselect() says data is available now. Exiting...");
printf("\nreturned from select(), now executing read()...");
s = read(efd, &eftd_ctr, sizeof(uint64_t));
if (s != sizeof(uint64_t)){
printf("\neventfd read error. Exiting...");
} else {
printf("\nReturned from read(), value read = %lld",eftd_ctr);
}
} else if (retval == 0) {
printf("\nselect() says that no data was available");
}
printf("\nClosing eventfd. Exiting...");
close(efd);
printf("\n");
exit(EXIT_SUCCESS);
}
Kernel Module C code (efd_lkm.c):
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/pid.h>
#include <linux/sched.h>
#include <linux/fdtable.h>
#include <linux/rcupdate.h>
#include <linux/eventfd.h>
//Received from userspace. Process ID and eventfd's File descriptor are enough to uniquely identify an eventfd object.
int pid;
int efd;
//Resolved references...
struct task_struct * userspace_task = NULL; //...to userspace program's task struct
struct file * efd_file = NULL; //...to eventfd's file struct
struct eventfd_ctx * efd_ctx = NULL; //...and finally to eventfd context
//Increment Counter by 1
static uint64_t plus_one = 1;
int init_module(void) {
printk(KERN_ALERT "~~~Received from userspace: pid=%d efd=%d\n",pid,efd);
userspace_task = pid_task(find_vpid(pid), PIDTYPE_PID);
printk(KERN_ALERT "~~~Resolved pointer to the userspace program's task struct: %p\n",userspace_task);
printk(KERN_ALERT "~~~Resolved pointer to the userspace program's files struct: %p\n",userspace_task->files);
rcu_read_lock();
efd_file = fcheck_files(userspace_task->files, efd);
rcu_read_unlock();
printk(KERN_ALERT "~~~Resolved pointer to the userspace program's eventfd's file struct: %p\n",efd_file);
efd_ctx = eventfd_ctx_fileget(efd_file);
if (!efd_ctx) {
printk(KERN_ALERT "~~~eventfd_ctx_fileget() Jhol, Bye.\n");
return -1;
}
printk(KERN_ALERT "~~~Resolved pointer to the userspace program's eventfd's context: %p\n",efd_ctx);
eventfd_signal(efd_ctx, plus_one);
printk(KERN_ALERT "~~~Incremented userspace program's eventfd's counter by 1\n");
eventfd_ctx_put(efd_ctx);
return 0;
}
void cleanup_module(void) {
printk(KERN_ALERT "~~~Module Exiting...\n");
}
MODULE_LICENSE("GPL");
module_param(pid, int, 0);
module_param(efd, int, 0);
To run this, carry out the following steps:
Compile the userspace program (efd_us.out) and the kernel module (efd_lkm.ko)
Run the userspace program (./efd_us.out) and note the pid and efd values that it print. (for eg. "pid=2803 efd=3". The userspace program will wait endlessly on select()
Open a new terminal window and insert the kernel module passing the pid and efd as params: sudo insmod efd_lkm.ko pid=2803 efd=3
Switch back to the userspace program window and you will see that the userspace program has broken out of select and exited.
Consult the kernel source here:
http://lxr.free-electrons.com/source/fs/eventfd.c
Basically, send your userspace file descriptor, as produced by eventfd(), to your module via ioctl() or some other path. From the kernel, call eventfd_ctx_fdget() to get an eventfd context, then eventfd_signal() on the resulting context. Don't forget eventfd_ctx_put() when you're done with the context.
how do I resolve the "struct file *" pointers to these eventfds from kernelspace
You must resolve those pointers into data structures that this interface you've created has published (create new types and read the fields you want from struct file into it).
Is there better way to signal events to userspace from kernelspace?
Netlink sockets are another convenient way for the kernel to communicate with userspace. "Better" is in the eye of the beholder.

error: undefined reference to `sched_setaffinity' on windows xp

Basically the code below was intended for use on linux and maybe thats the reason I get the error because I'm using windows XP, but I figure that pthreads should work just as well on both machines. I'm using gcc as my compiler and I did link with -lpthread but I got the following error anyways.
|21|undefined reference to sched_setaffinity'|
|30|undefined reference tosched_setaffinity'|
If there is another method to setting the thread affinity using pthreads (on windows) let me know. I already know all about the windows.h thread affinity functions available but I want to keep things multiplatform. thanks.
#include <stdio.h>
#include <math.h>
#include <sched.h>
double waste_time(long n)
{
double res = 0;
long i = 0;
while(i <n * 200000)
{
i++;
res += sqrt (i);
}
return res;
}
int main(int argc, char **argv)
{
unsigned long mask = 1; /* processor 0 */
/* bind process to processor 0 */
if (sched_setaffinity(0, sizeof(mask), &mask) <0)//line 21
{
perror("sched_setaffinity");
}
/* waste some time so the work is visible with "top" */
printf ("result: %f\n", waste_time (2000));
mask = 2; /* process switches to processor 1 now */
if (sched_setaffinity(0, sizeof(mask), &mask) <0)//line 30
{
perror("sched_setaffinity");
}
/* waste some more time to see the processor switch */
printf ("result: %f\n", waste_time (2000));
}
sched_getaffinity() and sched_setaffinity() are strictly Linux-specific calls. Windows provides its own set of specific Win32 API calls that affect scheduling. See this answer for sample code for Windows.

Resources