I want to retrieve the sessionid from a task struct in an eBPF program. I have the following code in my eBPF program:
struct task_struct *task;
u32 sessionid;
task = (struct task_struct *)bpf_get_current_task();
sessionid = task->sessionid;
This runs, but the sessionid always ends up being -1. I read in this answer that I can use task_session to retrieve it, but I get an error about invalid memory access. I believe I need to use bpf_probe_read to move the task_struct that task points to onto the stack, but I can't get it to work. Is there anything I'm missing?
After a bit more digging through the task_struct struct I realised you could do this:
struct task_struct *task;
struct pid_link pid_link;
struct pid pid;
unsigned int sessionid;
task = (struct task_struct *)bpf_get_current_task();
bpf_probe_read(&pid_link, sizeof(pid_link), (void *)&task->group_leader->pids[PIDTYPE_SID]);
bpf_probe_read(&pid, sizeof(pid), (void *)pid_link.pid);
sessionid = pid.numbers[0].nr;
Related
I am new to kernel module progrmming. In my program I need to display the process name and its parent's pid.
The following is my simple_init() function
int simple_init(void){
printk(KERN_INFO "--------------Starting module--------------\n");
struct task_struct *intask = &init_task;
struct list_head fd = intask->children;
struct list_head *list;
struct task_struct *task;
int k =0;
list_for_each(list, &fd){
task = list_entry(list, struct task_struct, sibling);
struct task_struct *parent = task->parent;
pid_t parent_pid = parent -> pid;
printk(KERN_INFO "Name: %s ---- %d ----Parent: %d\n",task->comm, task->pid, parent_pid);
if (k==2) break;
k++;
}
return 0;
}
Problem is that after I added the line:
struct task_struct *parent = task->parent;
Now, when I run the insmod command it says Segementation fault, and I have to restart the machine (virtual machine) to try again.
Can anyone of you show me what's wrong with this?
Kernel code, in particular module initialization code, is not always running on behave of some process. Both interrrupt & scheduler related code is running without any particular process.
So I guess that your task could be NULL
I'm not sure what I'm doing incorrectly but it's time for some extra eyes. I make a device with device_create() providing some "extra data" as follows:
pDevice = device_create(ahcip_class, NULL, /*no parent*/
MKDEV(AHCIP_MAJOR, AHCIP_MINOR + i), &mydevs[i],
DRIVER_NAME "%d", AHCIP_MINOR + i);
Expecting that my sysfs attribute function is going to take a pointer to the struct kobject member of struct device I do the following with my attribute function
static ahcip_dev *get_ahcip_dev(struct kobject *ko)
{
ahcip_dev *adev = NULL;
struct device *pdev = container_of(ko, struct device, kobj);
if (!pdev) {
pr_err("%s:%d unable to find device struct in kobject\n",
__func__, __LINE__);
return NULL;
}
/* some debugging stuff */
pr_info("%s:%d mydevs[0] %p\n", __func__, __LINE__, mydevs);
pr_info("%s:%d mydevs[1] %p\n", __func__, __LINE__, mydevs+1);
pr_info("%s:%d mydevs[0].psysfs_dev %p\n", __func__, __LINE__,
mydevs->psysfs_dev);
pr_info("%s:%d mydevs[1].psysfs_dev %p\n", __func__, __LINE__,
(mydevs + 1)->psysfs_dev);
pr_info("%s:%d pdev %p\n", __func__, __LINE__, pdev);
adev = (ahcip_dev*)dev_get_drvdata(pdev);
/* return the pointer anyway, but if it's null, print to klog */
if (!adev)
pr_err("%s:%d no ahcip_dev, private driver data is NULL\n",
__func__, __LINE__);
return adev;
}
static ssize_t pxis_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buff)
{
u32 pi = 0;
ahcip_dev *adev = get_ahcip_dev(kobj);
/* get_ahcip_dev() will print what happened, this needs to return
* error code
*/
if (!adev)
return -EIO;
pi = adev->port_index;
return sprintf(buff, "%08x\n", get_port_reg(adev->hba->ports[pi], 0x10));
}
The output (condensed) from the above function shows:
get_ahcip_dev:175 mydevs[1].psysfs_dev ffff88011b2b4800
get_ahcip_dev:176 pdev ffff88011b2b47f0
pdev in this case should point to the same memory location as mydevs[1].psysfs_dev but it's 16 bytes "earlier". What am I doing wrong?
I don't like to answer my own questions but this seems appropriate in this case. The root of the problem was a faulty assumption of what the attribute function needed to process. Attribute functions have this prototype you can view in context here
ssize_t (*show)(struct kobject *, struct attribute *,char *);
ssize_t (*store)(struct kobject *,struct attribute *,const char *, size_t);
Since the device_create() function returns a struct device object defined as follows (excerpt only, see full def here)
struct device {
struct device *parent;
struct device_private *p;
struct kobject kobj;
...
}
I assumed that it was this pointer that my attribute function must process. Reading through the description of the problem shows that when I used the container_of macro to get what I thought was the address of the containing struct device I was 16 bytes "to early." Notice, the first two fields of this structure are pointers. On a 64 bit system, which mine is, this is 16 bytes.
Because the function prototypes are defined as shown above, I assumed I was getting a reference to the kobj field. Instead, I was getting the actual struct device object. In other words, I was getting the address I wanted upon function entry and was still trying to find it.
Something about this may be documented in the labyrinth of Linux kernel documentation. If anyone knows of it, please put a link here. I read much but didn't see this one coming. I hope this question and answer helps some other kernel newbie.
Your example did not show how you created the sysfs attribute nor how your pxis_show() function was associated with it. Since you're trying to connect this attribute to a struct device, might you happen to be calling device_create_file() to create the sysfs node? If so, examining the callback prototypes of struct device_attribute might make more sense. You will see that the show/store callbacks look like:
ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf);
ssize_t (*store)(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count);
In which case that might better explain your extra offset issue--the first parameter is actually already pointing to your struct device, and not the embedded kobj member, so that performing a container_of() will take you back an additional 16 bytes.
You might be confusing the device_attribute callbacks with the struct sysfs_ops format, as they look similar, but notice the parameter types are different. In fact, there is a mapping from sysfs_ops to struct device_attribute as can be seen with the wrapper functions dev_attr_show and dev_attr_store as can be found here
I am creating a kernel module to find the resident pages for all the process. I am using get_mm_rss() and for_each_process but it works
only for init process / first time after first iteration it doesn't work.
int __init schedp(void){
struct task_struct *p;
for_each_process(p) {
int pid = task_pid_nr(p);
printk("Process: %s (pid = %d) , (rpages : %lu)\n",
p->comm, pid, get_mm_rss(p->mm));
}
return 0;
}
Results:
BUG: unable to handle kernel NULL pointer dereference at 00000160,
You're probably getting NULL in p->mm, because some tasks may have invalid mm pointer because they are exiting or don't have mm (because they are kernel-threads, not sure).
When you confused on how to use kernel API, always look for examples inside kernel itself. Quick search with cross-reference tool gave me kernel/cpu.c:
for_each_process(p) {
struct task_struct *t;
/*
* Main thread might exit, but other threads may still have
* a valid mm. Find one.
*/
t = find_lock_task_mm(p);
if (!t)
continue;
cpumask_clear_cpu(cpu, mm_cpumask(t->mm));
task_unlock(t);
}
Note that you need to call find_lock_task_mm() and task_unlock() and explicitly check for NULL.
finally it works after creating a function which checks a mm_struct is valid or not for_each_process
struct task_struct *task_mm(struct task_struct *p){
struct task_struct *t;
rcu_read_lock();
for_each_thread(p, t) {
task_lock(t);
if (likely(t->mm))
goto found;
task_unlock(t);
}
t = NULL;
found:
rcu_read_unlock();
return t;
}
As I was going through the below chunk of Linux char driver code, I found the structure pointer current in printk.
I want to know what structure the current is pointing to and its complete elements.
What purpose does this structure serve?
ssize_t sleepy_read (struct file *filp, char __user *buf, size_t count, loff_t *pos)
{
printk(KERN_DEBUG "process %i (%s) going to sleep\n",
current->pid, current->comm);
wait_event_interruptible(wq, flag != 0);
flag = 0;
printk(KERN_DEBUG "awoken %i (%s)\n", current->pid, current->comm);
return 0;
}
It is a pointer to the current process ie, the process which has issued the system call.
From the docs:
The Current Process
Although kernel modules don't execute sequentially as applications do,
most actions performed by the kernel are related to a specific
process. Kernel code can know the current process driving it by
accessing the global item current, a pointer to struct task_struct,
which as of version 2.4 of the kernel is declared in
<asm/current.h>, included by <linux/sched.h>. The current pointer
refers to the user process currently executing. During the execution
of a system call, such as open or read, the current process is the one
that invoked the call. Kernel code can use process-specific
information by using current, if it needs to do so. An example of this
technique is presented in "Access Control on a Device File", in
Chapter 5, "Enhanced Char Driver Operations".
Actually, current is not properly a global variable any more, like it
was in the first Linux kernels. The developers optimized access to the
structure describing the current process by hiding it in the stack
page. You can look at the details of current in <asm/current.h>. While
the code you'll look at might seem hairy, we must keep in mind that
Linux is an SMP-compliant system, and a global variable simply won't
work when you are dealing with multiple CPUs. The details of the
implementation remain hidden to other kernel subsystems though, and a
device driver can just include and refer to the
current process.
From a module's point of view, current is just like the external
reference printk. A module can refer to current wherever it sees fit.
For example, the following statement prints the process ID and the
command name of the current process by accessing certain fields in
struct task_struct:
printk("The process is \"%s\" (pid %i)\n",
current->comm, current->pid);
The command name stored in current->comm is the base name of the
program file that is being executed by the current process.
Here is the complete structure the "current" is pointing to
task_struct
Each task_struct data structure describes a process or task in the system.
struct task_struct {
/* these are hardcoded - don't touch */
volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
long counter;
long priority;
unsigned long signal;
unsigned long blocked; /* bitmap of masked signals */
unsigned long flags; /* per process flags, defined below */
int errno;
long debugreg[8]; /* Hardware debugging registers */
struct exec_domain *exec_domain;
/* various fields */
struct linux_binfmt *binfmt;
struct task_struct *next_task, *prev_task;
struct task_struct *next_run, *prev_run;
unsigned long saved_kernel_stack;
unsigned long kernel_stack_page;
int exit_code, exit_signal;
/* ??? */
unsigned long personality;
int dumpable:1;
int did_exec:1;
int pid;
int pgrp;
int tty_old_pgrp;
int session;
/* boolean value for session group leader */
int leader;
int groups[NGROUPS];
/*
* pointers to (original) parent process, youngest child, younger sibling,
* older sibling, respectively. (p->father can be replaced with
* p->p_pptr->pid)
*/
struct task_struct *p_opptr, *p_pptr, *p_cptr,
*p_ysptr, *p_osptr;
struct wait_queue *wait_chldexit;
unsigned short uid,euid,suid,fsuid;
unsigned short gid,egid,sgid,fsgid;
unsigned long timeout, policy, rt_priority;
unsigned long it_real_value, it_prof_value, it_virt_value;
unsigned long it_real_incr, it_prof_incr, it_virt_incr;
struct timer_list real_timer;
long utime, stime, cutime, cstime, start_time;
/* mm fault and swap info: this can arguably be seen as either
mm-specific or thread-specific */
unsigned long min_flt, maj_flt, nswap, cmin_flt, cmaj_flt, cnswap;
int swappable:1;
unsigned long swap_address;
unsigned long old_maj_flt; /* old value of maj_flt */
unsigned long dec_flt; /* page fault count of the last time */
unsigned long swap_cnt; /* number of pages to swap on next pass */
/* limits */
struct rlimit rlim[RLIM_NLIMITS];
unsigned short used_math;
char comm[16];
/* file system info */
int link_count;
struct tty_struct *tty; /* NULL if no tty */
/* ipc stuff */
struct sem_undo *semundo;
struct sem_queue *semsleeping;
/* ldt for this task - used by Wine. If NULL, default_ldt is used */
struct desc_struct *ldt;
/* tss for this task */
struct thread_struct tss;
/* filesystem information */
struct fs_struct *fs;
/* open file information */
struct files_struct *files;
/* memory management info */
struct mm_struct *mm;
/* signal handlers */
struct signal_struct *sig;
#ifdef __SMP__
int processor;
int last_processor;
int lock_depth; /* Lock depth.
We can context switch in and out
of holding a syscall kernel lock... */
#endif
};
I'm studying about Linux kernel and I have a problem.
I see many Linux kernel source files have current->files. So what is the current?
struct file *fget(unsigned int fd)
{
struct file *file;
struct files_struct *files = current->files;
rcu_read_lock();
file = fcheck_files(files, fd);
if (file) {
/* File object ref couldn't be taken */
if (file->f_mode & FMODE_PATH ||
!atomic_long_inc_not_zero(&file->f_count))
file = NULL;
}
rcu_read_unlock();
return file;
}
It's a pointer to the current process (i.e. the process that issued the system call).
On x86, it's defined in arch/x86/include/asm/current.h (similar files for other archs).
#ifndef _ASM_X86_CURRENT_H
#define _ASM_X86_CURRENT_H
#include <linux/compiler.h>
#include <asm/percpu.h>
#ifndef __ASSEMBLY__
struct task_struct;
DECLARE_PER_CPU(struct task_struct *, current_task);
static __always_inline struct task_struct *get_current(void)
{
return percpu_read_stable(current_task);
}
#define current get_current()
#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_CURRENT_H */
More information in Linux Device Drivers chapter 2:
The current pointer refers to the user process currently executing. During the execution of a system call, such as open or read, the current process is the one that invoked the call. Kernel code can use process-specific information by using current, if it needs to do so. [...]
Current is a global variable of type struct task_struct. You can find it's definition at [1].
Files is a struct files_struct and it contains information of the files used by the current process.
[1] http://students.mimuw.edu.pl/SO/LabLinux/PROCESY/ZRODLA/sched.h.html
this is ARM64 definition. in arch/arm64/include/asm/current.h, https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/current.h
struct task_struct;
/*
* We don't use read_sysreg() as we want the compiler to cache the value where
* possible.
*/
static __always_inline struct task_struct *get_current(void)
{
unsigned long sp_el0;
asm ("mrs %0, sp_el0" : "=r" (sp_el0));
return (struct task_struct *)sp_el0;
}
#define current get_current()
which just use the sp_el0 register. As the pointer to current process's task_struct