How to use proc_pid_cmdline in kernel module - linux

I am writing a kernel module to get the list of pids with their complete process name. The proc_pid_cmdline() gives the complete process name;using same function /proc/*/cmdline gets the complete process name. (struct task_struct) -> comm gives hint of what process it is, but not the complete path.
I have included the function name, but it gives error because it does not know where to find the function.
How to use proc_pid_cmdline() in a module ?

You are not supposed to call proc_pid_cmdline().
It is a non-public function in fs/proc/base.c:
static int proc_pid_cmdline(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
However, what it does is simple:
get_cmdline(task, m->buf, PAGE_SIZE);
That is not likely to return the full path though and it will not be possible to determine the full path in every case. The arg[0] value may be overwritten, the file could be deleted or moved, etc. A process may exec() in a way which obscures the original command line, and all kinds of other maladies.
A scan of my Fedora 20 system /proc/*/cmdline turns up all kinds of less-than-useful results:
-F
BUG:
WARNING: at
WARNING: CPU:
INFO: possible recursive locking detecte
ernel BUG at
list_del corruption
list_add corruption
do_IRQ: stack overflow:
ear stack overflow (cur:
eneral protection fault
nable to handle kernel
ouble fault:
RTNL: assertion failed
eek! page_mapcount(page) went negative!
adness at
NETDEV WATCHDOG
ysctl table check failed
: nobody cared
IRQ handler type mismatch
Machine Check Exception:
Machine check events logged
divide error:
bounds:
coprocessor segment overrun:
invalid TSS:
segment not present:
invalid opcode:
alignment check:
stack segment:
fpu exception:
simd exception:
iret exception:
/var/log/messages
--
/usr/bin/abrt-dump-oops
-xtD

I have managed to solve a version of this problem. I wanted to access the cmdline of all PIDs but within the kernel itself (as opposed to a kernel module as the question states), but perhaps these principles can be applied to kernel modules as well?
What I did was, I added the following function to fs/proc/base.c
int proc_get_cmdline(struct task_struct *task, char * buffer) {
int i;
int ret = proc_pid_cmdline(task, buffer);
for(i = 0; i < ret - 1; i++) {
if(buffer[i] == '\0')
buffer[i] = ' ';
}
return 0;
}
I then added the declaration in include/linux/proc_fs.h
int proc_get_cmdline(struct task_struct *, char *);
At this point, I could access the cmdline of all processes within the kernel.
To access the task_struct, perhaps you could refer to kernel: efficient way to find task_struct by pid?.
Once you have the task_struct, you should be able to do something like:
char cmdline[256];
proc_get_cmdline(task, cmdline);
if(strlen(cmdline) > 0)
printk(" cmdline :%s\n", cmdline);
else
printk(" cmdline :%s\n", task->comm);
I was able to obtain the commandline of all processes this way.

To get the full path of the binary behind a process.
char * exepathp;
struct file * exe_file;
struct mm_struct *mm;
char exe_path [1000];
//straight up stolen from get_mm_exe_file
mm = get_task_mm(current);
down_read(&mm->mmap_sem); //lock read
exe_file = mm->exe_file;
if (exe_file) get_file(exe_file);
up_read(&mm->mmap_sem); //unlock read
//reduce exe path to a string
exepathp = d_path( &(exe_file->f_path), exe_path, 1000*sizeof(char) );
Where current is the task struct for the process you are interested in. The variable exepathp gets the string of the full path. This is slightly different than the process cmd, this is the path of binary which was loaded to start the process. Combining this path with the process cmd should give you the full path.

Related

Where does "Freeing unused kernel memory" come from?

I often see Freeing unused kernel memory: xxxK (......) from dmesg, but I can never find this log from kernel source code with the help of grep/rg.
Where does it come from?
That line of text does not exist as a single, complete string, hence your failure to grep it.
This all gets rolling when free_initmem() in init/main.c calls free_initmem_default().
The line in question originates from free_initmem_default() in include/linux/mm.h:
/*
* Default method to free all the __init memory into the buddy system.
* The freed pages will be poisoned with pattern "poison" if it's within
* range [0, UCHAR_MAX].
* Return pages freed into the buddy system.
*/
static inline unsigned long free_initmem_default(int poison)
{
extern char __init_begin[], __init_end[];
return free_reserved_area(&__init_begin, &__init_end,
poison, "unused kernel");
}
The rest of that text is from free_reserved_area() in mm/page_alloc.c:
unsigned long free_reserved_area(void *start, void *end, int poison, const char *s)
{
void *pos;
unsigned long pages = 0;
...
if (pages && s)
pr_info("Freeing %s memory: %ldK\n",
s, pages << (PAGE_SHIFT - 10));
return pages;
}
(Code excerpts from v5.2)
From my answer here:
Some functions in the kernel source code are marked with __init because they run only once during initialization. This instructs the compiler to mark a function in a special way. The linker collects all such functions and puts them at the end of the final binary file.
Example method signature:
static int __init clk_disable_unused(void)
{
// some code
}
When the kernel starts, this code runs only once during initialization. After it runs, the kernel can free this memory to reuse it and you will see the kernel
message:
Freeing unused kernel memory: 108k freed

How to access the /proc file system's iiterate function pointer

I am trying to create a simple linux rootkit that can be used to hide processess. The method I have chosen to try is to replace the pointer to "/proc" 's iterate function with a pointer to a custom one that will hide the processess I need. To do this I have to first save a pointer to it's originale iterate function so it can be replaced later. The '/proc' file system's iterate function can be accessed by accessing the 'iterate' function pointer which is a member of it's 'file_operations' structure.
I have tried the following two methods to access it, as can be seen in the code segment as label_1 and label_2, each tested with the other commented out.
static struct file *proc_filp;
static struct proc_dir_entry *test_proc;
static struct proc_dir_entry *proc_root;
static struct file_operations *fs_ops;
int (*proc_iterate) (struct file *, struct dir_context *);
static int __init module_init(void)
{
label_1:
proc_filp = filp_open("/proc", O_RDONLY | O_DIRECTORY, 0);
fs_ops = (struct file_operations *) proc_filp->f_op;
printk(KERN_INFO "file_operations is %p", fs_ops);
proc_iterate = fs_ops->iterate;
printk(KERN_INFO "proc_iterate is %p", proc_iterate);
filp_close(proc_filp, NULL);
label_2:
test_proc = proc_create("test_proc", 0, NULL, &proc_fops);
proc_root = test_proc->parent;
printk(KERN_INFO "proc_root is %p", proc_root);
fs_ops = (struct file_operations *) proc_root->proc_fops;
printk(KERN_INFO "file_operations is %p", fs_ops);
proc_iterate = fs_ops->iterate;
printk(KERN_INFO "proc_iterate is %p", proc_iterate);
remove_proc_entry("test_proc", NULL);
return 0;
}
As per method 1, I open '/proc' as a file, then follow the file pointer to access it's 'struct file_operations' (f_op), and then follow this pointer to try and locate 'iterate'. However, although I am able to successfully access the 'f_op' structure, somehow it's 'iterate' seems to point to NULL. The following dmesg log shows the output of this.
[ 47.707558] file_operations is 00000000b8d10f59
[ 47.707564] proc_iterate is (null)
As per method 2, I create a new proc directory entry, then try to access it's parent directory (which should point to '/proc' itself), and then try to access it's 'struct file_operations' (proc_fops), and then try to follow on to 'iterate'. However, using this method I am not even able to access the '/proc' directory, as 'proc_root = test_proc->parent;' seems to return NULL. This causes a 'kernel NULL pointer dereference' error in the code that follows. The following dmesg log shows the output of this.
[ 212.078552] proc_root is (null)
[ 212.078567] BUG: unable to handle kernel NULL pointer dereference at 000000000000003
Now, I know that things have changed in the linux kernel, and we are not allowed to write at various addresses (like change the iterate pointer to point at a custom function for example) and that this can be overcome by making those pages writable in the kernel, but that will come later in my attempt to create this rootkit. At present I can't even figure out how to even read the original 'iterate' pointer.
So, I have these questions :
[1] Is something wrong in the following code ? How to fix it ?
[2] Is there any other way to access the pointer to the /proc 's iterate function ?
Tested on Arch Linux running linux 4.15.7

Configure kern.log to give more info about a segfault

Currently I can find in kern.log entries like this:
[6516247.445846] ex3.x[30901]: segfault at 0 ip 0000000000400564 sp 00007fff96ecb170 error 6 in ex3.x[400000+1000]
[6516254.095173] ex3.x[30907]: segfault at 0 ip 0000000000400564 sp 00007fff0001dcf0 error 6 in ex3.x[400000+1000]
[6516662.523395] ex3.x[31524]: segfault at 7fff80000000 ip 00007f2e11e4aa79 sp 00007fff807061a0 error 4 in libc-2.13.so[7f2e11dcf000+180000]
(You see, apps causing segfault are named ex3.x, means exercise 3 executable).
Is there a way to ask kern.log to log the complete path? Something like:
[6...] /home/user/cclass/ex3.x[3...]: segfault at 0 ip 0564 sp 07f70 error 6 in ex3.x[4...]
So I can easily figure out from who (user/student) this ex3.x is?
Thanks!
Beco
That log message comes from the kernel with a fixed format that only includes the first 16 letters of the executable excluding the path as per show_signal_msg, see other relevant lines for segmentation fault on non x86 architectures.
As mentioned by Makyen, without significant changes to the kernel and a recompile, the message given to klogd which is passed to syslog won't have the information you are requesting.
I am not aware of any log transformation or injection functionality in syslog or klogd which would allow you to take the name of the file and run either locate or file on the filesystem in order to find the full path.
The best way to get the information you are looking for is to use crash interception software like apport or abrt or corekeeper. These tools store the process metadata from the /proc filesystem including the process's commandline which would include the directory run from, assuming the binary was run with a full path, and wasn't already in path.
The other more generic way would be to enable core dumps, and then to set /proc/sys/kernel/core_pattern to include %E, in order to have the core file name including the path of the binary.
The short answer is: No, it is not possible without making code changes and recompiling the kernel. The normal solution to this problem is to instruct your students to name their executable <student user name>_ex3.x so that you can easily have this information.
However, it is possible to get the information you desire from other methods. Appleman1234 has provided some alternatives in his answer to this question.
How do we know the answer is "Not possible to the the full path in the kern.log segfault messages without recompiling the kernel":
We look in the kernel source code to find out how the message is produced and if there are any configuration options.
The files in question are part of the kernel source. You can download the entire kernel source as an rpm package (or other type of package) for whatever version of linux/debian you are running from a variety of places.
Specifically, the output that you are seeing is produced from whichever of the following files is for your architecture:
linux/arch/sparc/mm/fault_32.c
linux/arch/sparc/mm/fault_64.c
linux/arch/um/kernel/trap.c
linux/arch/x86/mm/fault.c
An example of the relevant function from one of the files(linux/arch/x86/mm/fault.c):
/*
* Print out info about fatal segfaults, if the show_unhandled_signals
* sysctl is set:
*/
static inline void
show_signal_msg(struct pt_regs *regs, unsigned long error_code,
unsigned long address, struct task_struct *tsk)
{
if (!unhandled_signal(tsk, SIGSEGV))
return;
if (!printk_ratelimit())
return;
printk("%s%s[%d]: segfault at %lx ip %p sp %p error %lx",
task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
tsk->comm, task_pid_nr(tsk), address,
(void *)regs->ip, (void *)regs->sp, error_code);
print_vma_addr(KERN_CONT " in ", regs->ip);
printk(KERN_CONT "\n");
}
From that we see that the variable passed to printout the process identifier is tsk->comm where struct task_struct *tsk and regs->ip where struct pt_regs *regs
Then from linux/include/linux/sched.h
struct task_struct {
...
char comm[TASK_COMM_LEN]; /* executable name excluding path
- access with [gs]et_task_comm (which lock
it with task_lock())
- initialized normally by setup_new_exec */
The comment makes it clear that the path for the executable is not stored in the structure.
For regs->ip where struct pt_regs *regs, it is defined in whichever of the following are appropriate for your architecture:
arch/arc/include/asm/ptrace.h
arch/arm/include/asm/ptrace.h
arch/arm64/include/asm/ptrace.h
arch/cris/include/arch-v10/arch/ptrace.h
arch/cris/include/arch-v32/arch/ptrace.h
arch/metag/include/asm/ptrace.h
arch/mips/include/asm/ptrace.h
arch/openrisc/include/asm/ptrace.h
arch/um/include/asm/ptrace-generic.h
arch/x86/include/asm/ptrace.h
arch/xtensa/include/asm/ptrace.h
From there we see that struct pt_regs is defining registers for the architecture. ip is just: unsigned long ip;
Thus, we have to look at what print_vma_addr() does. It is defined in mm/memory.c
/*
* Print the name of a VMA.
*/
void print_vma_addr(char *prefix, unsigned long ip)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
/*
* Do not print if we are in atomic
* contexts (in exception stacks, etc.):
*/
if (preempt_count())
return;
down_read(&mm->mmap_sem);
vma = find_vma(mm, ip);
if (vma && vma->vm_file) {
struct file *f = vma->vm_file;
char *buf = (char *)__get_free_page(GFP_KERNEL);
if (buf) {
char *p;
p = d_path(&f->f_path, buf, PAGE_SIZE);
if (IS_ERR(p))
p = "?";
printk("%s%s[%lx+%lx]", prefix, kbasename(p),
vma->vm_start,
vma->vm_end - vma->vm_start);
free_page((unsigned long)buf);
}
}
up_read(&mm->mmap_sem);
}
Which shows us that a path was available. We would need to check that it was the path, but looking a bit further in the code gives a hint that it might not matter. We need to see what kbasename() did with the path that is passed to it. kbasename() is defined in include/linux/string.h as:
/**
* kbasename - return the last part of a pathname.
*
* #path: path to extract the filename from.
*/
static inline const char *kbasename(const char *path)
{
const char *tail = strrchr(path, '/');
return tail ? tail + 1 : path;
}
Which, even if the full path is available prior to it, chops off everything except for the last part of a pathname, leaving the filename.
Thus, no amount of runtime configuration options will permit printing out the full pathname of the file in the segment fault messages you are seeing.
NOTE: I've changed all of the links to kernel source to be to archives, rather than the original locations. Those links will get close to the code as it was at the time I wrote this, 2104-09. As should be no surprise, the code does evolve over time, so the code which is current when you're reading this may or may not be similar or perform in the way which is described here.

How does linux know when to allocate more pages to a call stack?

Given the program below, segfault() will (As the name suggests) segfault the program by accessing 256k below the stack. nofault() however, gradually pushes below the stack all the way to 1m below, but never segfaults.
Additionally, running segfault() after nofault() doesn't result in an error either.
If I put sleep()s in nofault() and use the time to cat /proc/$pid/maps I see the allocated stack space grows between the first and second call, this explains why segfault() doesn't crash afterwards - there's plenty of memory.
But the disassembly shows there's no change to %rsp. This makes sense since that would screw up the call stack.
I presumed that the maximum stack size would be baked into the binary at compile time (In retrospect that would be very hard for a compiler to do) or that it would just periodically check %rsp and add a buffer after that.
How does the kernel know when to increase the stack memory?
#include <stdio.h>
#include <unistd.h>
void segfault(){
char * x;
int a;
for( x = (char *)&x-1024*256; x<(char *)(&x+1); x++){
a = *x & 0xFF;
printf("%p = 0x%02x\n",x,a);
}
}
void nofault(){
char * x;
int a;
sleep(20);
for( x = (char *)(&x); x>(char *)&x-1024*1024; x--){
a = *x & 0xFF;
printf("%p = 0x%02x\n",x,a);
}
sleep(20);
}
int main(){
nofault();
segfault();
}
The processor raises a page fault when you access an unmapped page. The kernel's page fault handler checks whether the address is reasonably close to the process's %rsp and if so, it allocates some memory and resumes the process. If you are too far below %rsp, the kernel passes the fault along to the process as a signal.
I tried to find the precise definition of what addresses are close enough to %rsp to trigger stack growth, and came up with this from linux/arch/x86/mm.c:
/*
* Accessing the stack below %sp is always a bug.
* The large cushion allows instructions like enter
* and pusha to work. ("enter $65535, $31" pushes
* 32 pointers and then decrements %sp by 65535.)
*/
if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {
bad_area(regs, error_code, address);
return;
}
But experimenting with your program I found that 65536+32*sizeof(unsigned long) isn't the actual cutoff point between segfault and no segfault. It seems to be about twice that value. So I'll just stick with the vague "reasonably close" as my official answer.

linux get process name from pid within kernel

hi
i have used sys_getpid() from within kernel to get process id
how can I find out process name from kernel struct? does it exist in kernel??
thanks very much
struct task_struct contains a member called comm, it contains executable name excluding path.
Get current macro from this file will get you the name of the program that launched the current process (as in insmod / modprobe).
Using above info you can use get the name info.
Not sure, but find_task_by_pid_ns might be useful.
My kernel module loads with "modprobe -v my_module --allow-unsupported -o some-data" and I extract the "some-data" parameter. The following code gave me the entire command line, and here is how I parsed out the parameter of interest:
struct mm_struct *mm;
unsigned char x, cmdlen;
mm = get_task_mm(current);
down_read(&mm->mmap_sem);
cmdlen = mm->arg_end - mm->arg_start;
for(x=0; x<cmdlen; x++) {
if(*(unsigned char *)(mm->arg_start + x) == '-' && *(unsigned char *)(mm->arg_start + (x+1)) == 'o') {
break;
}
}
up_read(&mm->mmap_sem);
if(x == cmdlen) {
printk(KERN_ERR "inject: ERROR - no target specified\n");
return -EINVAL;
}
strcpy(target,(unsigned char *)(mm->arg_start + (x+3)));
"target" holds the string after the -o parameter. You can compress this somewhat - the caller (in this case, modprobe) will be the first string in mm->arg_start - to suit your needs.
you can look at the special files in /proc/<pid>/
For example, /proc/<pid>/exe is a symlink pointing to the actual binary.
/proc/<pid>/cmdline is a null-delimited list of the command line, so the first word is the process name.

Resources