Disabling vsyscalls in Linux - linux

I'm working on a piece of software that monitors other processes' system calls using ptrace(2). Unfortunately most modern operating system implement some kind of fast user-mode syscalls that are called vsyscalls in Linux.
Is there any way to disable the use of vsyscalls/vDSO for a single process or, if that is not possible, for the whole operating system?

Try echo 0 > /proc/sys/kernel/vsyscall64
If you're trying to ptrace on gettimeofday calls and they aren't showing up, what time source is the system using (pmtimer, acpi, tsc, hpet, etc). I wonder if you'd humor me by trying to force your timer to something older like pmtimer. It's possible one of the many gtod timer specific optimizations is causing your ptrace calls to be avoided, even with vsyscall set to zero.

Is there any way to disable the use of vsyscalls/vDSO for a single process or, if that is not possible, for the whole operating system?
It turns out there IS a way to effectively disable linking vDSO for a single process without disabling it system-wide using ptrace!
All you have to do is to stop the traced process before it returns from execve and remove the AT_SYSINFO_EHDR entry from the auxiliary vector (which comes directly after environment variables along the memory region pointed to in rsp). PTRACE_EVENT_EXEC is a good place to do this.
AT_SYSINFO_EHDR is what the kernel uses to tell the system linker where vDSO is mapped in the process's address space. If this entry is not present, ld seems to act as if the system hasn't mapped a vDSO.
Note that this doesn't somehow unmap the vDSO from your processes memory, it merely ignores it when linking other shared libraries. A malicious program will still be able to interact with it if the author really wanted to.
I know this answer is a bit late, but I hope this information will spare some poor soul a headache

For newer systems echo 0 > /proc/sys/kernel/vsyscall64 might not work. In Ubuntu 16.04 vDSO can be disabled system-wide by adding the kernel parameter vdso=0 in /etc/default/grub under the parameter: GRUB_CMDLINE_LINUX_DEFAULT.
IMPORTANT: Parameter GRUB_CMDLINE_LINUX_DEFAULT might be overwriten by other configuration files in /etc/default/grub.d/..., so double check when to add your custom configuration.

Picking up on Tenders McChiken's approach, I did create a wrapper that disables vDSO for an arbitrary binary, without affecting the rest of the system: https://github.com/danteu/novdso
The general procedure is quite simple:
use ptrace to wait for return from execve(2)
find the address of the auxvector
overwrite the AT_SYSINFO_EHDR entry with AT_IGNORE, telling the application to ignore the following value

I know this is an older question, but nobody has mentioned a third useful way of disabling the vDSO on a per-process basis. You can overwrite the libc functions with your own that performs the actual system call using LD_PRELOAD.
A simple shared library for overriding the gettimeofday and time functions, for example, could look like this:
vdso_override.c:
#include <time.h>
#include <sys/time.h>
#include <unistd.h>
#include <sys/syscall.h>
int gettimeofday(struct timeval *restrict tv, struct timezone *restrict tz)
{
return syscall(__NR_gettimeofday, (long)tv, (long)tz, 0, 0, 0, 0);
}
time_t time(time_t *tloc)
{
return syscall(__NR_time, (long)tloc, 0, 0, 0, 0, 0);
}
This uses the libc wrapper to issue a raw system call (see syscall(2)), so the vDSO is circumvented. You would have to overwrite all system calls that the vDSO exports on your architecture in this way (listed at vdso(7)).
Compile with
gcc -fpic -shared -o vdso_override.so vdso_override.c
Then run any program in which you want to disable VDSO calls as follows:
LD_PRELOAD=./vdso_override.so <some program>
This of course only works if the program you are running is not actively trying to circumvent this. While you can override a symbol using LD_PRELOAD, if the target program really wants to, there is a way to find the original symbol and use that instead.

Related

How to link kernel functions to user-space program?

I have a user-space program (Capstone). I would like to use it in FreeBSD kernel. I believe most of them have the same function name, semantics and arguments (in FreeBSD, the kernel printf is also named printf). First I built it as libcapstone.a library, and link my program with it. Since the include files are different between Linux user-space and FreeBSD kernel, the library cannot find symbols like sprintf, memset and vsnprintf. How could I make these symbols (from FreeBSD kernel) visible to libcapstone.a?
If directly including header files like <sys/systm.h> in Linux user-space source code, the errors would be like undefined type u_int or u_char, even if I add -D_BSD_SOURCE to CFLAGS.
Additionally, any other better ways to do this?
You also need ; take a look at kernel man pages, eg "man 9 printf". They list required includes at the top.
Note, however, that you're trying to do something really hard. Some basic functions (eg printf) might be there; others differ completely (eg malloc(9)), and most POSIX APIs are simply not there. You won't be able to use open(2), socket(2), or fork(2).

System calls : difference between sys_exit(), SYS_exit and exit()

What is the difference between SYS_exit, sys_exit() and exit()?
What I understand :
The linux kernel provides system calls, which are listed in man 2 syscalls.
There are wrapper functions of those syscalls provided by glibc which have mostly similar names as the syscalls.
My question : In man 2 syscalls, there is no mention of SYS_exit and sys_exit(), for example. What are they?
Note : The syscall exit here is only an example. My question really is : What are SYS_xxx and sys_xxx()?
I'll use exit() as in your example although this applies to all system calls.
The functions of the form sys_exit() are the actual entry points to the kernel routine that implements the function you think of as exit(). These symbols are not even available to user-mode programmers. That is, unless you are hacking the kernel, you cannot link to these functions because their symbols are not available outside the kernel. If I wrote libmsw.a which had a file scope function like
static int msw_func() {}
defined in it, you would have no success trying to link to it because it is not exported in the libmsw symbol table; that is:
cc your_program.c libmsw.a
would yield an error like:
ld: cannot resolve symbol msw_func
because it isn't exported; the same applies for sys_exit() as contained in the kernel.
In order for a user program to get to kernel routines, the syscall(2) interface needs to be used to effect a switch from user-mode to kernel mode. When that mode-switch (somtimes called a trap) occurs a small integer is used to look up the proper kernel routine in a kernel table that maps integers to kernel functions. An entry in the table has the form
{SYS_exit, sys_exit},
Where SYS_exit is an preprocessor macro which is
#define SYS_exit (1)
and has been 1 since before you were born because there hasn't been reason to change it. It also happens to be the first entry in the table of system calls which makes look up a simple array index.
As you note in your question, the proper way for a regular user-mode program to access sys_exit is through the thin wrapper in glibc (or similar core library). The only reason you'd ever need to mess with SYS_exit or sys_exit is if you were writing kernel code.
This is now addressed in man syscall itself,
Roughly speaking, the code belonging to the system call with number __NR_xxx defined in /usr/include/asm/unistd.h can be found in the Linux kernel source in the routine sys_xxx(). (The dispatch table for i386 can be found in /usr/src/linux/arch/i386/kernel/entry.S.) There are many exceptions, however, mostly because older system calls were superseded by newer ones, and this has been treated somewhat unsystematically. On platforms with proprietary operating-system emulation, such as parisc, sparc, sparc64, and alpha, there are many additional system calls; mips64 also contains a full set of 32-bit system calls.
At least now /usr/include/asm/unistd.h is a preprocessor hack that links to either,
/usr/include/asm/unistd_32.h
/usr/include/asm/unistd_x32.h
/usr/include/asm/unistd_64.h
The C function exit() is defined in stdlib.h. Think of this as a high level event driven interface that allows you to register a callback with atexit()
/* Call all functions registered with `atexit' and `on_exit',
in the reverse of the order in which they were registered,
perform stdio cleanup, and terminate program execution with STATUS. */
extern void exit (int __status) __THROW __attribute__ ((__noreturn__));
So essentially the kernel provides an interface (C symbols) called __NR_xxx. Traditionally people want sys_exit() which is defined with a preprocessor macro SYS_exit. This macro creates the sys_exit() function. The exit() function is part of the standard C library stdlib.h and ported to other operating systems that lack the Linux Kernel ABI entirely (there may not be __NR_xxx functions) and potentially don't even have sys_* functions available either (you could write exit() to send the interrupt or use VDSO in Assembly).

How to allocate a new TLS area with clone system call

Short version of question: What parameter do I need to pass to the clone system call on x86_64 Linux system if I want to allocate a new TLS area for the thread that I am creating.
Long version:
I am working on a research project and for something I am experimenting with I want to create threads using the clone system call instead of using pthread_create. However, I also want to be able to use thread local storage. I don't plan on creating many threads right now, so it would be fine for me to create a new TLS area for each thread that I create with the clone system call.
I was looking at the man page for clone and it has the following information about the flag for the TLS parameter:
CLONE_SETTLS (since Linux 2.5.32)
The newtls argument is the new TLS (Thread Local Storage) descriptor.
(See set_thread_area(2).)
So I looked at the man page for set_thread_area and noticed the following which looked promising:
When set_thread_area() is passed an entry_number of -1, it uses a
free TLS entry. If set_thread_area() finds a free TLS entry, the value of
u_info->entry_number is set upon return to show which entry was changed.
However, after experimenting with this some it appears that set_thread_area is not implemented in my system (Ubunut 10.04 on an x86_64 platform). When I run the following code I get an error that says: set_thread_area() failed: Function not implemented
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <linux/unistd.h>
#include <asm/ldt.h>
int main()
{
struct user_desc u_info;
u_info.entry_number = -1;
int rc = syscall(SYS_set_thread_area,&u_info);
if(rc < 0) {
perror("set_thread_area() failed");
exit(-1);
}
printf("entry_number is %d",u_info.entry_number);
}
I also saw that when I use strace the see what happens when pthread_create is called that I don't see any calls to set_thread_area. I have also been looking at the nptl pthread source code to try to understand what they do when creating threads. But I don't completely understand it yet and I think it is more complex than what I'm trying to do since I don't need something that is as robust at the pthread implementation. I'm assuming that the set_thread_area system call is for x86 and that there is a different mechanism used for x86_64. But for the moment I have not been able to figure out what it is so I'm hoping this question will help me get some ideas about what I need to look at.
I am working on a research project and for something I am experimenting with I want to create threads using the clone system call instead of using pthread_create
In the exceedingly unlikely scenario where your new thread never calls any libc functions (either directly, or by calling something else which calls libc; this also includes dynamic symbol resolution via PLT), then you can pass whatever TLS storage you desire as the the new_tls parameter to clone.
You should ignore all references to set_thread_area -- they only apply to 32-bit/ix86 case.
If you are planning to use libc in your newly-created thread, you should abandon your approach: libc expects TLS to be set up a certain way, and there is no way for you to arrange for such setup when you call clone directly. Your new thread will intermittently crash when libc discovers that you didn't set up TLS properly. Debugging such crashes is exceedingly difficult, and the only reliable solution is ... to use pthread_create.
The other answer is absolutely correct in that setting up a thread outside of libc's control is guaranteed to cause trouble at a certain point. You can do it, but you can no longer rely on libc's services, definitely not on any of the pthread_* functions or thread-local variables (defined as such using __thread or thread_local).
That being said, you can set one of the segment registers used for TLS (GS and FS) even on x86-64. The system call to look for is prctl(ARCH_SET_GS, ...).
You can see an example comparing setting up TLS registers on i386 and x86-64 in this piece of code.

How to prohibit system calls, GNU/Linux

I'm currently working on the back-end of ACM-like public programming contest system. In such system, any user can submit a code source, which will be compiled and run automatically (which means, no human-eye pre-moderation is performed) in attempt to solve some computational problem.
Back-end is a GNU/Linux dedicated machine, where a user will be created for each contestant, all such users being part of users group. Sources sent by any particular user will be stored at the user's home directory, then compiled and executed to be verified against various test cases.
What I want is to prohibit usage of Linux system calls for the sources. That's because problems require platform-independent solutions, while enabling system calls for insecure source is a potential security breach. Such sources may be successfully placed in the FS, even compiled, but never run. I also want to be notified whenever source containing system calls was sent.
By now, I see the following places where such checker may be placed:
Front-end/pre-compilation analysis - source already checked in the system, but not yet compiled. Simple text checker against system calls names. Platform-dependent, compiler-independent, language-dependent solution.
Compiler patch - crash GCC (or any other compiler included in the tool-chain) whenever system call is encountered. Platform-dependent, compiler-dependent, language-independent solution (if we place checker "far enough"). Compatibility may also be lost. In fact, I dislike this alternative most.
Run-time checker - whenever system call is invoked from the process, terminate this process and report. This solution is compiler and language independent, but depends on the platform - I'm OK with that, since I will deploy the back-end on similar platforms in short- and mid-terms.
So the question is: does GNU/Linux provide an opportunity for administrator to prohibit system calls usage for a usergroup, user or particular process? It may be a security policy or a lightweight GNU utility.
I tried to Google, but Google disliked me today.
mode 1 seccomp allows a process to limit itself to exactly four syscalls: read, write, sigreturn, and _exit. This can be used to severely sandbox code, as seccomp-nurse does.
mode 2 seccomp (at the time of writing, found in Ubuntu 12.04 or patch your own kernel) provides more flexibility in filtering syscalls. You can, for example, first set up filters, then exec the program under test. Appropriate use of chroot or unshare can be used to prevent it from re-execing anything else "interesting".
I think you need to define system call better. I mean,
cat <<EOF > hello.c
#include <stdio.h>
int main(int argc,char** argv) {
fprintf(stdout,"Hello world!\n");
return 0;
}
EOF
gcc hello.c
strace -q ./a.out
demonstrates that even an apparently trivial program makes ~27 system calls.
You (I assume) want to allow calls to the "standard C library", but those in turn will be implemented in terms of system calls. I guess what I'm trying to say is that run-time checking is less feasible than you might think (using strace or similar anyway).

Compile linux kernel (2.6) module including non kernel headers

Is it possible to compile a linux kernel(2.6) module that includes functionality defined by non-kernel includes?
For example:
kernelmodule.h
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h> // printk()
// ...
#include <openssl/sha.h>
// ...
Makefile
obj-m := kernelmodule.o
all:
$(MAKE) -C /lib/modules/`uname -r`/build M=`pwd` modules
clean:
$(MAKE) -C /lib/modules/`uname -r`/build M=`pwd` clean
$(RM) Module.markers modules.order
The kernel module I have written and are trying to compile contains functionality found in a number of openssl include files.
The standard makefile presented above doesn't allow includes outside of the linux headers. Is it possible to include this functionality, and if so, could you please point me in the right direction.
Thanks,
Mike
The kernel cannot use userspace code and must stand alone (i.e. be completely self contained, no libraries), therefore it does not pick up standard headers.
It is not clear what benefit trying to pick up userspace headers is. If there are things in there that it would be valid to use (constants, some macros perhaps provided they don't call any userspace functions), then it may be better to duplicate them and include only the kernel-compatible parts that you need.
It is not possible to link the kernel with libraries designed for userspace use - even if they don't make any OS calls - because the linking environment in the kernel cannot pick them up.
Instead, recompile any functions to be used in the kernel (assuming they don't make any OS or library calls - e.g. malloc - in which case they'll need to be modified anyway). Incorporate them into your own library to be used in your kernel modules.
Recent versions of linux contain cryptographic functions anyway, including various SHA hashes - perhaps you can use one of those instead.
Another idea would be to stop trying to do crypto in kernel-space and move the code to userspace. Userspace code is easier to write / debug / maintain etc.
I have taken bits of userspace code that I've written and converted it to work in kernel space (i.e. using kmalloc(), etc), it's not that difficult. However, you are confined to the kernel's understanding of C, not userspace, which differs slightly .. especially with various standard int types.
Just linking against user space DSO's is not possible — the Linux kernel is monolithic, completely self contained. It does not use userspace libc, libraries or other bits as others have noted.
9/10 times, you will find what you need somewhere in the kernel. It's very likely that someone else ran into the same need you have and wrote some static functions in some module to do what you want .. just grab those and re-use them.
In the case of crypto, as others have said, just use what's in the kernel. One thing to note, you'll need them to be enabled in kconfig which may or may not happen depending on what the user selects when building it. So, watch out for dependencies and be explicit, you may have to hack a few entries in kconfig that also select the crypto API you want when your module is selected. Doing that can be a bit of a pain when building out of tree.
So on the one hand we have "just copy and rename stuff while adding overall bloat", on the other you have "tell people they must have the full kernel source". It's one of the quirks that come with a monolithic kernel.
With a Microkernel, almost everything runs in userspace, no worries linking against a DSO for some driver ... it's a non issue. Please don't take that statement as a cue to re-start kernel design philosophy in comments, that's not in the scope of this question.

Resources