How to kill a process in a system call? - linux

I found out that sys_kill can be used to kill process from a system call, but when i compile the following code, i get the following error:
error: implicit declaration of function ‘sys_kill’ [-Werror=implicit-function-declaration]
long kill = sys_kill(pid,SIGKILL);
#define _POSIX_SOURCE
#include <linux/kernel.h>
#include <linux/unistd.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/sched.h>
#include <linux/cred.h>
asmlinkage long sys_killa(pid_t pid)
{
printk(KERN_INFO "Current UID = %u\n",get_current_user()->uid);
printk(KERN_WARNING "The process to be killed is %d \n", pid);
long kill = sys_kill(pid,SIGKILL);
printk(KERN_WARNING "sys kill returned %ld\n", kill);
return 0;
}

It is often not possible to call the entry point of a system call from the kernel, since the API is for use from user space. Sometimes the functionality is provided in code close to the implementation code. It is usually made available throughout the kernel via an EXPORT_SYMBOL() macro.
For the kill() system call there is the internal kernel function kill_pid, with the declaration
int kill_pid(struct pid *pid, int sig, int priv)
You need to pass a struct pointer to the process, signal number, and boolean 1. Look at other code making this call for how to do so.

Related

How to prevent page faults after child exits?

A nice way of creating a snapshot of a process is to use fork() to create a child process. The memory of the child process will be a copy of the parent process.
Instead of eagerly copying all the memory, the OS simply marks the pages as copy-on-write: the pages will be cloned if the event of one of the processes writing to it. This saves both time and space, which is great.
In the event the child process exits, the copy-on-write behavior should be deactivated. However, I'm getting page faults for the whole array -- is there any way of optimizing these page faults? e.g. similar to how MAP_POPULATE avoids page faults for the initial access to the pages of a mapped region.
Below there is a simple benchmark that demonstrates the behavior I'm asking about. I check for page faults via perf stat -e minor-faults,major-faults ./a.out.
If no child process is created (WITH_CHILD set to false) I have very few page faults (around 125 and constant). However, just by creating and reaping the child process, I get page faults in everything (around 131260, proportional to array size). As the pages are mapped by a single process, I wouldn't expect any page faults to happen! Why do they?
This is a follow-up of Kernel copying CoW pages after child process exit.
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <array>
#include <cassert>
#include <cstring>
#include <iostream>
#define ARRAY_SIZE 536870912 // 512MB
#define WITH_CHILD true
using inttype = uint64_t;
constexpr uint64_t NUM_ELEMS() {
return ARRAY_SIZE / sizeof(inttype);
}
int main() {
// allocate array
void *arraybuf = mmap(nullptr, ARRAY_SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
assert(arraybuf != nullptr);
std::array<inttype, NUM_ELEMS()> *array =
new (arraybuf) std::array<inttype, NUM_ELEMS()>();
#if WITH_CHILD
// spawn checkpointing process
int pid = fork();
assert(pid != -1);
// child process -- do nothing, just exit
if (pid == 0) {
exit(0);
}
// wait for child thread to exit
assert(waitpid(pid, nullptr, 0) == pid);
#endif
// write to array -- this shouldnt generate page faults, right? :(
std::fill(array->begin(), array->end(), 0);
// cleanup
munmap(array, ARRAY_SIZE);
}

Calling "clone()" on linux but it seems to malfunction

A simple test program, I expect it will "clone" to fork a child process, and each process can execute till its end
#include<stdio.h>
#include<sched.h>
#include<unistd.h>
#include<sys/types.h>
#include<errno.h>
int f(void*arg)
{
pid_t pid=getpid();
printf("child pid=%d\n",pid);
}
char buf[1024];
int main()
{
printf("before clone\n");
int pid=clone(f,buf,CLONE_VM|CLONE_VFORK,NULL);
if(pid==-1){
printf("%d\n",errno);
return 1;
}
waitpid(pid,NULL,0);
printf("after clone\n");
printf("father pid=%d\n",getpid());
return 0;
}
Ru it:
$g++ testClone.cpp && ./a.out
before clone
It didn't print what I expected. Seems after "clone" the program is in unknown state and then quit. I tried gdb and it prints:
Breakpoint 1, main () at testClone.cpp:15
(gdb) n-
before clone
(gdb) n-
waiting for new child: No child processes.
(gdb) n-
Single stepping until exit from function clone#plt,-
which has no line number information.
If I remove the line of "waitpid", then gdb prints another kind of weird information.
(gdb) n-
before clone
(gdb) n-
Detaching after fork from child process 26709.
warning: Unexpected waitpid result 000000 when waiting for vfork-done
Cannot remove breakpoints because program is no longer writable.
It might be running in another process.
Further execution is probably impossible.
0x00007fb18a446bf1 in clone () from /lib64/libc.so.6
ptrace: No such process.
Where did I get wrong in my program?
You should never call clone in a user-level program -- there are way too many restrictions on what you are allowed to do in the cloned process.
In particular, calling any libc function (such as printf) is a complete no-no (because libc doesn't know that your clone exists, and have not performed any setup for it).
As K. A. Buhr points out, you also pass too small a stack, and the wrong end of it. Your stack is also not properly aligned.
In short, even though K. A. Buhr's modification appears to work, it doesn't really.
TL;DR: clone, just don't use it.
The second argument to clone is a pointer to the child's stack. As per the manual page for clone(2):
Stacks grow downward on all processors that run Linux (except the HP PA processors), so child_stack usually points to the topmost address of the memory space set up for the child stack.
Also, 1024 bytes is a paltry amount for a stack. The following modified version of your program appears to run correctly:
// #define _GNU_SOURCE // may be needed if compiled as C instead of C++
#include <stdio.h>
#include <sched.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <errno.h>
int f(void*arg)
{
pid_t pid=getpid();
printf("child pid=%d\n",pid);
return 0;
}
char buf[1024*1024]; // *** allocate more stack ***
int main()
{
printf("before clone\n");
int pid=clone(f,buf+sizeof(buf),CLONE_VM|CLONE_VFORK,NULL);
// *** in previous line: pointer is to *end* of stack ***
if(pid==-1){
printf("%d\n",errno);
return 1;
}
waitpid(pid,NULL,0);
printf("after clone\n");
printf("father pid=%d\n",getpid());
return 0;
}
Also, #Employed Russian is right -- you probably shouldn't use clone except if you're trying to have some fun. Either fork or vfork are more sensible interfaces to clone whenever they meet your needs.

How to extract features from linux kernel?

I'm working on a project which detects a malware based on Machine Learning techniques. My primary targets are linux devices. My first question is;
How can I extract data about processes from a linux kernel using a kernel driver?
I'd like to extract data about running processes by myself for the first time just for proof of concept. Later on I'd like to write a kernel driver to do that automatically and in real time.
Are there any other ways to extract data for running processes such as ProcessName, PID, UID, IS_ROOT and etc.?
To do this from User space:
ps -U <username/UID> | tr -s ' '| tr ' ' ','| cut -d ',' -f2,5 > out.csv
From Kernel space, as a module:
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sched.h>
static int uid=0;
static int procx_init(void){
struct task_struct *task;
for_each_process(task)
printk ("uid=%d, pid=%d, command=%s\n", task->cred->uid, task->pid, task->comm);
return 0;
}
static void procx_exit(void)
{
printk("procx destructor\n");
}
module_init(procx_init);
module_exit(procx_exit);
module_param(uid, int, 0);
MODULE_AUTHOR ("sundeep471#gmail.com");
MODULE_DESCRIPTION ("Print process Info");
MODULE_LICENSE("GPL");
I didn't check for the UID, but you can pass it as module parameter or runtime passer to trigger a kthread

Ptrace reset a breakpoint

I am having trouble resetting a process after I have hit a breakpoint with Ptrace. I am essentially wrapping this code in python.
I am running this on 64 bit Ubuntu.
I understand the concept of resetting the data at the location and decrementing the instruction pointer, but after I get the trap signal and do that, my process is not finishing.
Code snippet:
# Continue to bp
res = libc.ptrace(PTRACE_CONT,pid,0,0)
libc.wait(byref(wait_status))
if _wifstopped(wait_status):
print('Breakpoint hit. Signal: %s' % (strsignal(_wstopsig(wait_status))))
else:
print('Error process failed to stop')
exit(1)
# Reset Instruction pointer
data = get_registers(pid)
print_rip(data)
data.rip -= 1
res = set_registers(pid,data)
# Verify rip
print_rip(get_registers(pid))
# Reset Instruction
out = set_text(pid,c_ulonglong(addr),c_ulonglong(initial_data))
if out != 0:
print_errno()
print_text(c_ulonglong(addr),c_ulonglong(get_text(c_void_p(addr))))
And I run a PTRACE_DETACH right after returning from this code.
When I run this, it hits the breakpoint the parent process returns successfully, but the child does not resume and finish its code.
If I comment out the call to the breakpoint function it just attaches ptrace to the process and then detaches it, and the program runs fine.
The program itself is just a small c program that prints 10 times to a file.
Full code is in this paste
Is there an error anyone sees with my breakpoint code?
I ended up writing a C program that was as exact a duplicate of the python code as possible:
#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <syscall.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/reg.h>
#include <sys/user.h>
#include <unistd.h>
#include <errno.h>
#include <time.h>
void set_unset_bp(pid){
int wait_status;
struct user_regs_struct regs;
unsigned long long addr = 0x0000000000400710;
unsigned long long data = ptrace(PTRACE_PEEKTEXT,pid,(void *)addr,0);
printf("Orig data: 0x%016x\n",data);
unsigned long long trap = (data & 0xFFFFFFFFFFFFFF00) | 0xCC;
ptrace(PTRACE_POKETEXT,pid,(void *)addr,(void *)trap);
ptrace(PTRACE_CONT,pid,0,0);
wait(&wait_status);
if(WIFSTOPPED(wait_status)){
printf("Signal recieved: %s\n",strsignal(WSTOPSIG(wait_status)));
}else{
perror("wait");
}
ptrace(PTRACE_POKETEXT,pid,(void *)addr,(void *)data);
ptrace(PTRACE_GETREGS,pid,0,&regs);
regs.rip -=1;
ptrace(PTRACE_SETREGS,pid,0,&regs);
data = ptrace(PTRACE_PEEKTEXT,pid,(void *)addr,0);
printf("Data after resetting bp data: 0x%016x\n",data);
ptrace(PTRACE_CONT,pid,0,0);
}
int main(void){
//Fork child process
extern int errno;
int pid = fork();
if(pid ==0){//Child
ptrace(PTRACE_TRACEME,0,0,0);
int out = execl("/home/chris/workspace/eliben-debugger/print","/home/chris/workspace/eliben-debugger/print",0);
if(out != 0){
printf("Error Value is: %s\n", strerror(errno));
}
}else{ //Parent
wait(0);
printf("Got stop signal, we just execv'd\n");
set_unset_bp(pid);
printf("Finished setting and unsetting\n");
wait(0);
printf("Got signal, detaching\n");
ptrace(PTRACE_DETACH,pid,0,0);
wait(0);
printf("Parent exiting after waiting for child to finish\n");
}
exit(0);
}
After comparing the output to my Python output I noticed that according to python my original data was 0xfffffffffffe4be8 and 0x00000000fffe4be8.
This lead me to believe that my return data was getting truncated to a 32 bit value.
I changed my get and set methods to something like this, setting the return type to a void pointer:
def get_text(addr):
restype = libc.ptrace.restype
libc.ptrace.restype = c_void_p
out = libc.ptrace(PTRACE_PEEKTEXT,pid,addr, 0)
libc.ptrace.restype = restype
return out
def set_text(pid,addr,data):
return libc.ptrace(PTRACE_POKETEXT,pid,addr,data)
Can't tell you how it works yet, but I was able to get the child process executing successfully after the trap.

Ordering file location on linux partition

I have a process which processes a lot of files (~96,000 files, ~12 TB data). Several runs of the process has left the files scattered about the drive. Each iteration in the process, uses several files. This leads to a lot of whipsawing around the disk collecting the files.
Ideally, I would like the process to write the files it uses in order, so that the next run will read them in order (file sizes change). Is there a way to hint at a physical ordering/grouping, short of writing to the raw partition?
Any other suggestions would be helpful.
Thanks
There are two system calls you might lookup: fadvise64, fallocate tell the kernel how you intend to read or write a given file.
Another tip is the "Orlov block allocator" (Wikipedia, LWN) affects the way the kernel will allocate new directories and file-entries.
In the end I decided not to worry about writing the files in any particular ordering. Instead, before I started a run, I would figure out where the first block of each file was located, and then sort the file processing order by first block location. Not perfect, but it did make a big difference in processing times.
Here's the C code I used to get the first block of supplied file list I adapted it from example code I found online (can't seem to find the original source).
#include <stdio.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <linux/fs.h>
//
// Get the first block for each file passed to stdin,
// write filename & first block for each file to stdout
//
int main(int argc, char **argv) {
int fd;
int block;
char fname[512];
while(fgets(fname, 511, stdin) != NULL) {
fname[strlen(fname) - 1] = '\0';
assert(fd=open(fname, O_RDONLY));
block = 0;
if (ioctl(fd, FIBMAP, &block)) {
printf("FIBMAP ioctl failed - errno: %s\n", strerror(errno));
}
printf("%010d, %s\n", block, fname);
close(fd);
}
return 0;
}

Resources