How to extract features from linux kernel? - linux

I'm working on a project which detects a malware based on Machine Learning techniques. My primary targets are linux devices. My first question is;
How can I extract data about processes from a linux kernel using a kernel driver?
I'd like to extract data about running processes by myself for the first time just for proof of concept. Later on I'd like to write a kernel driver to do that automatically and in real time.
Are there any other ways to extract data for running processes such as ProcessName, PID, UID, IS_ROOT and etc.?

To do this from User space:
ps -U <username/UID> | tr -s ' '| tr ' ' ','| cut -d ',' -f2,5 > out.csv
From Kernel space, as a module:
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sched.h>
static int uid=0;
static int procx_init(void){
struct task_struct *task;
printk ("uid=%d, pid=%d, command=%s\n", task->cred->uid, task->pid, task->comm);
return 0;
static void procx_exit(void)
printk("procx destructor\n");
module_param(uid, int, 0);
MODULE_DESCRIPTION ("Print process Info");
I didn't check for the UID, but you can pass it as module parameter or runtime passer to trigger a kthread


How to prevent page faults after child exits?

A nice way of creating a snapshot of a process is to use fork() to create a child process. The memory of the child process will be a copy of the parent process.
Instead of eagerly copying all the memory, the OS simply marks the pages as copy-on-write: the pages will be cloned if the event of one of the processes writing to it. This saves both time and space, which is great.
In the event the child process exits, the copy-on-write behavior should be deactivated. However, I'm getting page faults for the whole array -- is there any way of optimizing these page faults? e.g. similar to how MAP_POPULATE avoids page faults for the initial access to the pages of a mapped region.
Below there is a simple benchmark that demonstrates the behavior I'm asking about. I check for page faults via perf stat -e minor-faults,major-faults ./a.out.
If no child process is created (WITH_CHILD set to false) I have very few page faults (around 125 and constant). However, just by creating and reaping the child process, I get page faults in everything (around 131260, proportional to array size). As the pages are mapped by a single process, I wouldn't expect any page faults to happen! Why do they?
This is a follow-up of Kernel copying CoW pages after child process exit.
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <array>
#include <cassert>
#include <cstring>
#include <iostream>
#define ARRAY_SIZE 536870912 // 512MB
#define WITH_CHILD true
using inttype = uint64_t;
constexpr uint64_t NUM_ELEMS() {
return ARRAY_SIZE / sizeof(inttype);
int main() {
// allocate array
void *arraybuf = mmap(nullptr, ARRAY_SIZE, PROT_READ | PROT_WRITE,
assert(arraybuf != nullptr);
std::array<inttype, NUM_ELEMS()> *array =
new (arraybuf) std::array<inttype, NUM_ELEMS()>();
// spawn checkpointing process
int pid = fork();
assert(pid != -1);
// child process -- do nothing, just exit
if (pid == 0) {
// wait for child thread to exit
assert(waitpid(pid, nullptr, 0) == pid);
// write to array -- this shouldnt generate page faults, right? :(
std::fill(array->begin(), array->end(), 0);
// cleanup
munmap(array, ARRAY_SIZE);

Cygwin FIFO vs native Linux FIFO - discrepancy in blocking behaviour?

The code shown is based on an example using named pipes from some tutorial site
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
int main()
int fd;
char readbuf[80];
int read_bytes;
// mknod(FIFO_FILE, S_IFIFO|0640, 0);
mkfifo(FIFO_FILE, 0777);
while(1) {
fd = open(FIFO_FILE, O_RDONLY);
read_bytes = read(fd, readbuf, sizeof(readbuf));
readbuf[read_bytes] = '\0';
printf("Received string: \"%s\". Length is %d\n", readbuf, (int)strlen(readbuf));
return 0;
When executing the server in Windows, using Cygwin, then the server enters an undesired loop, repeating the same message. For example, if you write in a shell:
$ ./server
then the "server" waits for the client, but when the FIFO is not empty, e.g. writing in a new shell
$ echo "Hello" > MYFIFO
then the server enters an infinite loop, repeating the "Hello"-string
Received string: "Hello". Length is 4
Received string: "Hello". Length is 4
Furthermore, new strings written to the fifo doesn't seem to be read by the server. However, in Linux the behaviour is quite different. In Linux, the server prints the string and waits for new data to appear on the fifo. What is the reason for this discrepancy ?
You need to fix your code to remove at least 3 bugs:
You're not doing a close(fd) so you will get a file descriptor leak and eventually be unable to open() new files.
You're not checking the value of fd (if it returns -1 then there was an error).
You're not checking the value of read (if it returns -1 then there was an error)... and your readbuf[read_bytes] = '\0'; will not be doing what you expect as a result.
When you get an error then errno will tell you what went wrong.
These bugs probably explain why you keep getting Hello output (especially the readbuf[read_bytes] problem).

How to kill a process in a system call?

I found out that sys_kill can be used to kill process from a system call, but when i compile the following code, i get the following error:
error: implicit declaration of function ‘sys_kill’ [-Werror=implicit-function-declaration]
long kill = sys_kill(pid,SIGKILL);
#include <linux/kernel.h>
#include <linux/unistd.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/sched.h>
#include <linux/cred.h>
asmlinkage long sys_killa(pid_t pid)
printk(KERN_INFO "Current UID = %u\n",get_current_user()->uid);
printk(KERN_WARNING "The process to be killed is %d \n", pid);
long kill = sys_kill(pid,SIGKILL);
printk(KERN_WARNING "sys kill returned %ld\n", kill);
return 0;
It is often not possible to call the entry point of a system call from the kernel, since the API is for use from user space. Sometimes the functionality is provided in code close to the implementation code. It is usually made available throughout the kernel via an EXPORT_SYMBOL() macro.
For the kill() system call there is the internal kernel function kill_pid, with the declaration
int kill_pid(struct pid *pid, int sig, int priv)
You need to pass a struct pointer to the process, signal number, and boolean 1. Look at other code making this call for how to do so.

shm_open() fails with EINVAL when creating shared memory in subdirectory of /dev/shm

I have a GNU/Linux application with uses a number of shared memory objects. It could, potentially, be run a number of times on the same system. To keep things tidy, I first create a directory in /dev/shm for each of the set of shared memory objects.
The problem is that on newer GNU/Linux distributions, I no longer seem to be able create these in a sub-directory of /dev/shm.
The following is a minimal C program with illustrates what I'm talking about:
* shm_minimal.c
* Test shm_open()
* Expect to create shared memory file in:
* /dev/shm/
* └── my_dir
*    └── shm_name
* NOTE: Only visible on filesystem during execution. I try to be nice, and
* clean up after myself.
* Compile with:
* $ gcc -lrt shm_minimal.c -o shm_minimal
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
int main(int argc, const char* argv[]) {
int shm_fd = -1;
char* shm_dir = "/dev/shm/my_dir";
char* shm_file = "/my_dir/shm_name"; /* does NOT work */
//char* shm_file = "/my_dir_shm_name"; /* works */
// Create directory in /dev/shm
mkdir(shm_dir, 0777);
// make shared memory segment
shm_fd = shm_open(shm_file, O_RDWR | O_CREAT, 0600);
if (-1 == shm_fd) {
switch (errno) {
case EINVAL:
/* Confirmed on:
* kernel v3.14, GNU libc v2.19 (ArchLinux)
* kernel v3.13, GNU libc v2.19 (Ubuntu 14.04 Beta 2)
perror("FAIL - EINVAL");
return 1;
printf("Some other problem not being tested\n");
return 2;
} else {
/* Confirmed on:
* kernel v3.8, GNU libc v2.17 (Mint 15)
* kernel v3.2, GNU libc v2.15 (Xubuntu 12.04 LTS)
* kernel v3.1, GNU libc v2.13 (Debian 6.0)
* kernel v2.6.32, GNU libc v2.12 (RHEL 6.4)
printf("Success !!!\n");
// clean up
return 0;
/* vi: set ts=2 sw=2 ai expandtab:
When I run this program on a fairly new distribution, the call to shm_open() returns -1, and errno is set to EINVAL. However, when I run on something a little older, it creates the shared memory object in /dev/shm/my_dir as expected.
For the larger application, the solution is simple. I can use a common prefix instead of a directory.
If you could help enlighten me to this apparent change in behavior it would be very helpful. I suspect someone else out there might be trying to do something similar.
So it turns out the issue stems from how GNU libc validates the shared memory name. Specifically, the shared memory object MUST now be at the root of the shmfs mount point.
This was changed in glibc git commit b20de2c3d9 as the result of bug BZ #16274.
Specifically, the change is the line:
if (name[0] == '\0' || namelen > NAME_MAX || strchr (name, '/') != NULL)
Which now disallows '/' from anywhere in the filename (not counting leading '/')
If you have a third party tool that was broken by this shm_open change, a brilliant coworker found a workaround : preload a library that overrides the shm_open call and swaps slashes for underscores. It does the same for shm_unlink as well, so the application can properly free shared memory when needed. :
#include <dlfcn.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <algorithm>
#include <string>
// function used in place of the standard shm_open() function
extern "C" int shm_open(const char *name, int oflag, mode_t mode)
// keep a function pointer to the real shm_open() function
static int (*real_open)(const char *, int, mode_t) = NULL;
// the first time in, ask the dynamic linker to find the real shm_open() function
if (!real_open) real_open = (int (*)(const char *, int, mode_t)) dlsym(RTLD_NEXT,"shm_open");
// take the name we were given and replace all slashes with underscores instead
std::string n = name;
std::replace(n.begin(), n.end(), '/', '_');
// call the real open function with the patched path name
return real_open(n.c_str(), oflag, mode);
// function used in place of the standard shm_unlink() function
extern "C" int shm_unlink(const char *name)
// keep a function pointer to the real shm_unlink() function
static int (*real_unlink)(const char *) = NULL;
// the first time in, ask the dynamic linker to find the real shm_unlink() function
if (!real_unlink) real_unlink = (int (*)(const char *)) dlsym(RTLD_NEXT, "shm_unlink");
// take the name we were given and replace all slashes with underscores instead
std::string n = name;
std::replace(n.begin(), n.end(), '/', '_');
// call the real unlink function with the patched path name
return real_unlink(n.c_str());
To compile this file:
c++ -fPIC -shared -o -ldl
And preload it before starting a process that tries to use non-standard slash characters in shm_open:
in bash:
export LD_PRELOAD=/path/to/
in tcsh:
setenv LD_PRELOAD /path/to/

Ordering file location on linux partition

I have a process which processes a lot of files (~96,000 files, ~12 TB data). Several runs of the process has left the files scattered about the drive. Each iteration in the process, uses several files. This leads to a lot of whipsawing around the disk collecting the files.
Ideally, I would like the process to write the files it uses in order, so that the next run will read them in order (file sizes change). Is there a way to hint at a physical ordering/grouping, short of writing to the raw partition?
Any other suggestions would be helpful.
There are two system calls you might lookup: fadvise64, fallocate tell the kernel how you intend to read or write a given file.
Another tip is the "Orlov block allocator" (Wikipedia, LWN) affects the way the kernel will allocate new directories and file-entries.
In the end I decided not to worry about writing the files in any particular ordering. Instead, before I started a run, I would figure out where the first block of each file was located, and then sort the file processing order by first block location. Not perfect, but it did make a big difference in processing times.
Here's the C code I used to get the first block of supplied file list I adapted it from example code I found online (can't seem to find the original source).
#include <stdio.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <linux/fs.h>
// Get the first block for each file passed to stdin,
// write filename & first block for each file to stdout
int main(int argc, char **argv) {
int fd;
int block;
char fname[512];
while(fgets(fname, 511, stdin) != NULL) {
fname[strlen(fname) - 1] = '\0';
assert(fd=open(fname, O_RDONLY));
block = 0;
if (ioctl(fd, FIBMAP, &block)) {
printf("FIBMAP ioctl failed - errno: %s\n", strerror(errno));
printf("%010d, %s\n", block, fname);
return 0;
