Source of clock_gettime

Source of clock_gettime - linux

I've tried to understand the behavior of the function clock_gettime by looking at the source code of the linux kernel.
I'm currently using a 4.4.0-83-lowlatency but I only could get the 4.4.76 source files (but it should be close enough).
My first issue is that there is several occurrence of the function. I chose pc_clock_gettime which appears to be the closest and the only one handling CLOCK_MONOTONIC_RAW but if I'm wrong, please correct me.
I tracked back the execution flow of the function and came to a mysterious ravb_ptp_gettime64 and ravb_ptp_time_read which is related to the Ethernet driver.
So... If I understand correctly when I ask the system to give me the time, it ask to the Ethernet driver ?
This is the first time I looked into kernel code so I'm not used to it. If someone could give me an explanation of "how" and "why", it would be marvelous.

clock_gettime use a mechanism named vDSO (Virutal Dynamic Shared Object). It's a shared library which is mapped in the user space by the kernel.
vDSO allow the use of syscall frequently without a drawback on performances. So the kernel "puts" time informations into memory which user programm can access. In the end, it won't be a system call but only a simple function call.

Related

How do I open a file in a kernel module if calling process is in user space?

I am trying to create a character device driver that dumps /etc/shadow when read from as a non-privileged user. This is for purely academic purposes of course.
I was reading about how reading/writing files in kernel space opens a system to possible exploits. I am trying to implement this in practice.
Please spare me the "don't touch the filesystem in kernel mode" talk. I am precisely trying to exploit the nuances of doing so.
Problem is that the only way I have found so far that works to open a file in kernel mode is filp_open, which is currently producing EACCESS when I read from the device file as a non-privileged user. This was confounding at first as I assumed that I can do anything in kernel space.
For example, when I cat the device file I have created as a non-root user, filp_open produces EACCESS in kernel space???
Further investigation has led me to believe that filp_open checks the capabilities of the calling process. This would make sense as it is used internally by open(), but I am in kernel mode here! There must be a way!
I am very new to programming in kernel space. I have extensive application C experience, but I am finding it difficult to navigate the kernel documentation for precisely what I am looking for. Additionally, it seems that more and more symbols within the kernel are not exported for use in modules. As I am developing an exploit proof of concept, I would like it to work without recompiling the kernel. I am finding a lot of code (vfs and syscalls) that is deprecated as the symbols are no longer exported to kernel modules.
Is what I am trying to do a thing that is specifically engineered against? Loading a kernel module requires root to begin with, so I would see this more in the lens of a persistence focused attack rather than an access one.
Also, I got the proof of concept working by just reading from the file when the module is loaded, but this is no fun! Any pointers here are much appreciated.

After some rethinking and digging I have found two solutions to my problem. Thank you to Tsyvarev and stark for the pointers.
Solution 1
The first solution is to elevate the privileges of the calling process before making a call of filp_open. This is also basically making a rootkit, so not as interesting.
Here is a link to the guide that I found on the subject.
https://0x00sec.org/t/kernel-rootkits-getting-your-hands-dirty/1485
Solution 2
The module will have an init function that by nature must be run with elevated privs when the module is loaded. So you can open the file pointer there and just close it when the module is unloaded. Caveats are that you have the file pointer open the whole time, so all of the gotchas there are still present. Better to only read, writing is where things can get a bit tricky. This is the solution I chose in the interim, as I didn't want this thing to be a full rootkit.
Another direction is workqueue or to spawn a thread. Probably the most tricky but also the most inline with what my original vision of this demo was. I did not test this direction but it probably is the best solution.

Using user-space functions like sprintf in the kernel, or not?

I am making a /proc entry for my driver. So, in the read callback function the first argument is the location into which we write the data intended for the user. I searched on how to write the data in it and i could see that everybody is using sprintf for this purpose. I am surprised to see that it works in kernel space. However this should be wrong to use a user space function in kernel space. Also i cant figure out how to write in that location without using any user space function like strcpy, sprintf, etc. I am using kernel version 3.9.10. Please suggest me how i should do this without using sprintf or any other user space function.

Most of the 'normal' user-space functions would make no sense in kernel code, so they are not available in the kernel.
However, some functions like sprintf, strcpy, or memcpy are useful in kernel code, so the kernel implements them (more or less completely) and makes them available for drivers.
See include/linux/kernel.h and string.h.

sprintf is a kernel-space function in Linux. It is totally separate from its user-space namesake and may or may not work identically to it.
Just because a function in user-space exist, it does not mean an identically named function in kernel-space cannot.

kprobe vs uprobe system call interposition

I want to write a system call interposition by using Utrace. I understood that Utrace project has been abandoned, but part of its code is used on kprobe and uprobe.
I haven't understood really well how these work. Especially uprobe Can you explain what difference exists between them? And can I use uprobe without writing a module to check which are the actual parameters for a system call?
thanks

Kprobe creates and manages probepoints in kernel code, that is, you want to probe some kernel function, say, do_sys_open(). You need to take a look at Documentation/trace/kprobetrace.txt to get some usage of kprobe.
Uprobe creates and manages probepoints in user applications, that is, you want to probe some user-space function, but the probe is run in the kernel space on behalf of the probed process. You need to take a look at Documentation/trace/uprobetracer.txt to get the basic usage of uprobe, to see what it aims for.

pinning a pthread to a single core

I am trying to measure the performance of some library calls. My primary measurement tool is the rdtsc call. After doing some reading I realize that I need to disable preemption and interrupts in order to get the most accurate readings. Can someone help me figure out how to do these? I know that pthreads have a 'set affinity' mechanism. Is that enough to get the job done?
I also read somewhere that I can make calls into the kernel of the sort
preempt_disable()
raw_local_irq_save(...)
Is there any benefit to using one approach over the other? I tried the latter approach and got this error.
error: 'preempt_disable' was not declared in this scope
which can be fixed by including linux/preempt.h but the compiler still complains.
linux/preempt.h: No such file or directory
Obviously I have not done any kernel hacking and I could not find this file on my system anywhere. I am really hoping I wont have to install a new linux kernel. :)
Thanks for your input.

Pinning a pthread to a single CPU can be done using pthread_setaffinity_np
But what you want to achieve at the end is not so simple. I'll explain you why.
preempt.h is part of the Linux Kernel source. Its located here. You need to have kernel sources with you. Anyways, you need to write a kernel module to access it, you cannot use it from user space. Learn how to write a kernel module here. Same is the case with functions preempt_disable and other interrupt disabling kernel functions
Now the point is, pthreads are in user space and your preemption disabling function is in kernel space. How to interact?
Either you need to write a new system call of your own where you do your preemption and interrupt disabling and call it from user space. Or you need to resort to other Kernel-User Space Interfaces like procfs, sysfs, ioctl etc
But I am really skeptical as to how all these will help you to benchmark library functions. You may want to have a look at how performance is typically measured using rdtsc

Execute code in process's stack, on recent Linux

I want to use ptrace to write a piece of binary code in a running process's stack.
However, this causes segmentation fault (signal 11).
I can make sure the %eip register stores the pointer to the first instruction that I want to execute in the stack. I guess there is some mechanism that linux protects the stack data to be executable.
So, does anyone know how to disable such protection for stack. Specifically, I'm trying Fedora 15.
Thanks a lot!
After reading all replies, I tried execstack, which really makes code in stack executable. Thank you all!

This is probably due to the NX bit on modern processors. You may be able to disable this for your program using execstack.
http://advosys.ca/viewpoints/2009/07/disabling-the-nx-bit-for-specific-apps/
http://linux.die.net/man/8/execstack

As already mentioned it is due to the NX bit. But it is possible. I know for sure that gcc uses it itself for trampolines (which are a workaround to make e.g. function pointers of nested functions). I dont looked at the detailes, but I would recommend a look at the gcc code. Search in the sources for the architecture specific macro TARGET_ASM_TRAMPOLINE_TEMPLATE, there you should see how they do it.
EDIT: A quick google for that macro, gave me the hint: mprotect is used to change the permissions of the memory page. Also be carefull when you generate date and execute it - you maybe have in addition to flush the instruction cache.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string