How do I open a file in a kernel module if calling process is in user space? - linux

I am trying to create a character device driver that dumps /etc/shadow when read from as a non-privileged user. This is for purely academic purposes of course.
I was reading about how reading/writing files in kernel space opens a system to possible exploits. I am trying to implement this in practice.
Please spare me the "don't touch the filesystem in kernel mode" talk. I am precisely trying to exploit the nuances of doing so.
Problem is that the only way I have found so far that works to open a file in kernel mode is filp_open, which is currently producing EACCESS when I read from the device file as a non-privileged user. This was confounding at first as I assumed that I can do anything in kernel space.
For example, when I cat the device file I have created as a non-root user, filp_open produces EACCESS in kernel space???
Further investigation has led me to believe that filp_open checks the capabilities of the calling process. This would make sense as it is used internally by open(), but I am in kernel mode here! There must be a way!
I am very new to programming in kernel space. I have extensive application C experience, but I am finding it difficult to navigate the kernel documentation for precisely what I am looking for. Additionally, it seems that more and more symbols within the kernel are not exported for use in modules. As I am developing an exploit proof of concept, I would like it to work without recompiling the kernel. I am finding a lot of code (vfs and syscalls) that is deprecated as the symbols are no longer exported to kernel modules.
Is what I am trying to do a thing that is specifically engineered against? Loading a kernel module requires root to begin with, so I would see this more in the lens of a persistence focused attack rather than an access one.
Also, I got the proof of concept working by just reading from the file when the module is loaded, but this is no fun! Any pointers here are much appreciated.

After some rethinking and digging I have found two solutions to my problem. Thank you to Tsyvarev and stark for the pointers.
Solution 1
The first solution is to elevate the privileges of the calling process before making a call of filp_open. This is also basically making a rootkit, so not as interesting.
Here is a link to the guide that I found on the subject.
Solution 2
The module will have an init function that by nature must be run with elevated privs when the module is loaded. So you can open the file pointer there and just close it when the module is unloaded. Caveats are that you have the file pointer open the whole time, so all of the gotchas there are still present. Better to only read, writing is where things can get a bit tricky. This is the solution I chose in the interim, as I didn't want this thing to be a full rootkit.
Another direction is workqueue or to spawn a thread. Probably the most tricky but also the most inline with what my original vision of this demo was. I did not test this direction but it probably is the best solution.


Linux kernel : logging to a specific file

I am trying to edit the linux kernel. I want some information to be written out to a file as a part of the debugging process. I have read about the printk function. But i would like to add text to a particular file (file other from the default files that keep debug logs).
To cut it short: I would kind of like to specify the "destination" in the printk function (or at least some work-around it)
How can I achieve this? Will using fwrite/fopen work (if yes, will it work without causing much overhead compared to printk, since they are implemented differently)?
What other options do i have?
Using fopen and fwrite will certainly not work. Working with files in kernel space is generally a bad idea.
It all really depends on what you are doing in the kernel though. In some configurations, there may not even be a hard disk for you to write to. If however, you are working at a stage where you can have certain assumptions about the running kernel, you probably actually want to write a kernel module rather than edit the kernel itself. For all you care, a kernel module is just as good as any other part of the kernel, but they are inserted when the kernel is already up and running.
You may also be thinking of doing so for debugging, or have output of a kernel-level application (e.g. an application that you are forced to run at kernel level for real-time constraints etc). In that case, kio may be of interest to you, but if you want to use it, do make sure you understand why.
kio is a library I wrote just for those "kernel-level applications", which makes a kernel module see a /proc file as if it's a user of it (rather than a provider). To make it work, you should have a user-space application also opening that virtual file and redirect it to wherever you want to write your log. Something along the lines of opening the file with kopen in write mode and in user space tell cat /proc/your_file > ~/log_file.
Note: I still recommend printk unless you really know what you are doing. Since you are thinking of fopen in kernel space, I don't think you really know what you are doing.

Kernel module to monitor syscalls?

I would like to create a kernel module from scratch that latches to a user session and monitors each system call made by processes belonging to that user.
I know what everyone is thinking - "use strace" - but I'd like to have some of my own logging and analysis with the data I collect, and strace has some issues - an application could use "mmap" to write to a file without the file contents ever appearing as the arguments of an "open" system call, or an application without any write permission may create coredumps to copy sensitive data.
I want to be able to handle these special cases and do some of my own logging. I wonder though - how can I route all syscalls through my module? Is there any way to do that without touching the kernel code?
I don't have the exact answer to your question, but I red a paper a couple of days ago and it may be useful for you:
I have done something similar in the past by using a kernel module to patch the system call table. Each patched function did something like the following:
// pre checks
ret = origFunction(/*params*/);
// post checks
return ret;
Note that when you start mucking around in the kernel data structures, your module becomes version dependent. The kernel module will probably have to be compiled for the specific kernel version you are installing on.
Also note, this is a technique employed by many rootkits so if you have security software installed it may try to prevent you from doing something like this.

intercepting file system system calls

I am writing an application for which I need to intercept some filesystem system calls eg. unlink. I would like to save some file say abc. If user deletes the file then I need to copy it to some other place. So I need unlink to call my code before deleting abc so that I could save it. I have gone through threads related to intercepting system calls but methods like LD_PRELOAD it wont work in my case because I want this to be secure and implemented in kernel so this method wont be useful. inotify notifies after the event so I could not be able to save it. Could you suggest any such method. I would like to implement this in a kernel module instead of modifying kernel code itself.
Another method as suggested by Graham Lee, I had thought of this method but it has some problems ,I need hardlink mirror of all the files it consumes no space but still could be problematic as I have to repeatedly mirror drive to keep my mirror up to date, also it won't work cross partition and on partition not supporting link so I want a solution through which I could attach hooks to the files/directories and then watch for changes instead of repeated scanning.
I would also like to add support for write of modified file for which I cannot use hard links.
I would like to intercept system calls by replacing system calls but I have not been able to find any method of doing that in linux > 3.0. Please suggest some method of doing that.
As far as hooking into the kernel and intercepting system calls go, this is something I do in a security module I wrote:
Look at hijacks.c and symbols.c for the code; how they're used is in the hijack_syscalls function inside security.c. I haven't tried this on linux > 3.0 yet, but the same basic concept should still work.
It's a bit tricky, and you may have to write a good deal of kernel code to do the file copy before the unlink, but it's possible here.
One suggestion could be Filesystems in Userspace (FUSE.) That is, write a FUSE module (which is, granted, in userspace) which intercepts filesystem-related syscalls, performs whatever tasks you want, and possibly calls the "default" syscall afterwards.
You could then mount certain directories with your FUSE filesystem and, for most of your cases, it seems like the default syscall behavior would not need to be overridden.
You can watch unlink events with inotify, though this might happen too late for your purposes (I don't know because I don't know your purposes, and you should experiment to find out). The in-kernel alternatives based on LSM (by which I mean SMACK, TOMOYO and friends) are really for Mandatory Access Control so may not be suitable for your purposes.
If you want to handle deletions only, you could keep a "shadow" directory of hardlinks (created via link) to the files being watched (via inotify, as suggested by Graham Lee).
If the original is now unlinked, you still have the shadow file to handle as you want to, without using a kernel module.

Can regular file reading benefited from nonblocking-IO?

It seems not to me and I found a link that supports my opinion. What do you think?
The content of the link you posted is correct. A regular file socket, opened in non-blocking mode, will always be "ready" for reading; when you actually try to read it, blocking (or more accurately as your source points out, sleeping) will occur until the operation can succeed.
In any case, I think your source needs some sedatives. One angry person, that is.
I've been digging into this quite heavily for the past few hours and can attest that the author of the link you cited is correct. However, the appears to be "better" (using that term very loosely) support for non-blocking IO against regular files in native Linux Kernel for v2.6+. The "libaio" package contains a library that exposes the functionality offered by the kernel, but it has some caveats about the different types of file systems which are supported and it's not portable to anything outside of Linux 2.6+.
And here's another good article on the subject.
You're correct that nonblocking mode has no benefit for regular files, and is not allowed to. It would be nice if there were a secondary flag that could be set, along with O_NONBLOCK, to change this, but due to the way cache and virtual memory work, it's actually not an easy task to define what correct "non-blocking" behavior for ordinary files would mean. Certainly there would be race conditions unless you allowed programs to lock memory associated with the file. (In fact, one way to implement a sort of non-sleeping IO for ordinary files would be to mmap the file and mlock the map. After that, on any reasonable implementation, read and write would never sleep as long as the file offset and buffer size remained within the bounds of the mapped region.)

How to "hibernate" a process in Linux by storing its memory to disk and restoring it later?

Is it possible to 'hibernate' a process in linux?
Just like 'hibernate' in laptop, I would to write all the memory used by a process to disk, free up the RAM. And then later on, I can 'resume the process', i.e, reading all the data from memory and put it back to RAM and I can continue with my process?
I used to maintain CryoPID, which is a program that does exactly what you are talking about. It writes the contents of a program's address space, VDSO, file descriptor references and states to a file that can later be reconstructed. CryoPID started when there were no usable hooks in Linux itself and worked entirely from userspace (actually, it still does work, depending on your distro / kernel / security settings).
Problems were (indeed) sockets, pending RT signals, numerous X11 issues, the glibc caching getpid() implementation amongst many others. Randomization (especially VDSO) turned out to be insurmountable for the few of us working on it after Bernard walked away from it. However, it was fun and became the topic of several masters thesis.
If you are just contemplating a program that can save its running state and re-start directly into that state, its far .. far .. easier to just save that information from within the program itself, perhaps when servicing a signal.
I'd like to put a status update here, as of 2014.
The accepted answer suggests CryoPID as a tool to perform Checkpoint/Restore, but I found the project to be unmantained and impossible to compile with recent kernels.
Now, I found two actively mantained projects providing the application checkpointing feature.
The first, the one I suggest 'cause I have better luck running it, is CRIU
that performs checkpoint/restore mainly in userspace, and requires the kernel option CONFIG_CHECKPOINT_RESTORE enabled to work.
Checkpoint/Restore In Userspace, or CRIU (pronounced kree-oo, IPA: /krɪʊ/, Russian: криу), is a software tool for Linux operating system. Using this tool, you can freeze a running application (or part of it) and checkpoint it to a hard drive as a collection of files. You can then use the files to restore and run the application from the point it was frozen at. The distinctive feature of the CRIU project is that it is mainly implemented in user space.
The latter is DMTCP; quoting from their main page:
DMTCP (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpoint the state of multiple simultaneous applications, including multi-threaded and distributed applications. It operates directly on the user binary executable, without any Linux kernel modules or other kernel modifications.
There is also a nice Wikipedia page on the argument: Application_checkpointing
The answers mentioning ctrl-z are really talking about stopping the process with a signal, in this case SIGTSTP. You can issue a stop signal with kill:
kill -STOP <pid>
That will suspend execution of the process. It won't immediately free the memory used by it, but as memory is required for other processes the memory used by the stopped process will be gradually swapped out.
When you want to wake it up again, use
kill -CONT <pid>
The more complicated solutions, like CryoPID, are really only needed if you want the stopped process to be able to survive a system shutdown/restart - it doesn't sound like you need that.
Linux Kernel has now partially implemented the checkpoint/restart futures:, the status is here.
Some useful information are in the lwn(linux weekly net): ......
So the answer is "YES"
The issue is restoring the streams - files and sockets - that the program has open.
When your whole OS hibernates, the local files and such can obviously be restored. Network connections don't, but then the code that accesses the internet is typically more error checking and such and survives the error conditions (or ought to).
If you did per-program hibernation (without application support), how would you handle open files? What if another process accesses those files in the interim? etc?
Maintaining state when the program is not loaded is going to be difficult.
Simply suspending the threads and letting it get swapped to disk would have much the same effect?
Or run the program in a virtual machine and let the VM handle suspension.
Short answer is "yes, but not always reliably". Check out CryoPID:
Open files will indeed be the most common problem. CryoPID states explicitly:
Open files and offsets are restored.
Temporary files that have been
unlinked and are not accessible on the
filesystem are always saved in the
image. Other files that do not exist
on resume are not yet restored.
Support for saving file contents for
such situations is planned.
The same issues will also affect TCP connections, though CryoPID supports tcpcp for connection resuming.
I extended Cryopid producing a package called Cryopid2 available from SourceForge. This can
migrate a process as well as hibernating it (along with any open files and sockets - data
in sockets/pipes is sucked into the process on hibernation and spat back into these when
process is restarted).
The reason I have not been active with this project is I am not a kernel developer - both
this (and/or the original cryopid) need to get someone on board who can get them running
with the lastest kernels (e.g. Linux 3.x).
The Cryopid method does work - and is probably the best solution to general purpose process
hibernation/migration in Linux I have come across.
The short answer is "yes." You might start by looking at this for some ideas: ELF executable reconstruction from a core image (
As others have noted, it's difficult for the OS to provide this functionality, because the application needs to have some error checking builtin to handle broken streams.
However, on a side note, some programming languages and tools that use virtual machines explicitly support this functionality, such as the Self programming language.
This is sort of the ultimate goal of clustered operating system. Mathew Dillon puts a lot of effort to implement something like this in his Dragonfly BSD project.
adding another workaround: you can use virtualbox. run your applications in a regular virtual machine and simply "save the machine state" whenever you want.
I know this is not an answer, but I thought it could be useful when there are no real options.
if for any reason you don't like virtualbox, vmware and Qemu are as good.
Ctrl-Z increases the chances the process's pages will be swapped, but it doesn't free the process's resources completely. The problem with freeing a process's resources completely is that things like file handles, sockets are kernel resources the process gets to use, but doesn't know how to persist on its own. So Ctrl-Z is as good as it gets.
There was some research on checkpoint/restore for Linux back in 2.2 and 2.4 days, but it never made it past prototype. It is possible (with the caveats described in the other answers) for certain values of possible - I you can write a kernel module to do it, it is possible. But for the common value of possible (can I do it from the shell on a commercial Linux distribution), it is not yet possible.
There's ctrl+z in linux, but i'm not sure it offers the features you specified. I suspect you asked this question since it doesn't
