I want to write a system call interposition by using Utrace. I understood that Utrace project has been abandoned, but part of its code is used on kprobe and uprobe.
I haven't understood really well how these work. Especially uprobe Can you explain what difference exists between them? And can I use uprobe without writing a module to check which are the actual parameters for a system call?
thanks
Kprobe creates and manages probepoints in kernel code, that is, you want to probe some kernel function, say, do_sys_open(). You need to take a look at Documentation/trace/kprobetrace.txt to get some usage of kprobe.
Uprobe creates and manages probepoints in user applications, that is, you want to probe some user-space function, but the probe is run in the kernel space on behalf of the probed process. You need to take a look at Documentation/trace/uprobetracer.txt to get the basic usage of uprobe, to see what it aims for.
Related
I've tried to understand the behavior of the function clock_gettime by looking at the source code of the linux kernel.
I'm currently using a 4.4.0-83-lowlatency but I only could get the 4.4.76 source files (but it should be close enough).
My first issue is that there is several occurrence of the function. I chose pc_clock_gettime which appears to be the closest and the only one handling CLOCK_MONOTONIC_RAW but if I'm wrong, please correct me.
I tracked back the execution flow of the function and came to a mysterious ravb_ptp_gettime64 and ravb_ptp_time_read which is related to the Ethernet driver.
So... If I understand correctly when I ask the system to give me the time, it ask to the Ethernet driver ?
This is the first time I looked into kernel code so I'm not used to it. If someone could give me an explanation of "how" and "why", it would be marvelous.
clock_gettime use a mechanism named vDSO (Virutal Dynamic Shared Object). It's a shared library which is mapped in the user space by the kernel.
vDSO allow the use of syscall frequently without a drawback on performances. So the kernel "puts" time informations into memory which user programm can access. In the end, it won't be a system call but only a simple function call.
I am making a /proc entry for my driver. So, in the read callback function the first argument is the location into which we write the data intended for the user. I searched on how to write the data in it and i could see that everybody is using sprintf for this purpose. I am surprised to see that it works in kernel space. However this should be wrong to use a user space function in kernel space. Also i cant figure out how to write in that location without using any user space function like strcpy, sprintf, etc. I am using kernel version 3.9.10. Please suggest me how i should do this without using sprintf or any other user space function.
Most of the 'normal' user-space functions would make no sense in kernel code, so they are not available in the kernel.
However, some functions like sprintf, strcpy, or memcpy are useful in kernel code, so the kernel implements them (more or less completely) and makes them available for drivers.
See include/linux/kernel.h and string.h.
sprintf is a kernel-space function in Linux. It is totally separate from its user-space namesake and may or may not work identically to it.
Just because a function in user-space exist, it does not mean an identically named function in kernel-space cannot.
I want to capture all the system calls on a file system in great details. E.g. for write system call, I want to record the target file, number of bytes written and the offset that write occurs.
Currently, I want to implement such a logger with inotify. However, it cannot provide such details. E.g. for write it does not provide number of bytes written and offset.
An alternative is to use bbfs implemented on fuse. However, it will introduce overhead when logging system calls and delay user operations to some un-tolerable degree.
Is there some library that can capture system calls on file system, just like ptrace when logging all system calls issued by a process?
There are many options for tracing in Linux. But this sounds like a pretty simple case. Have you investigated simply using the strace utility? It has lots of options that can control tracing granularity, will log arguments to almost all syscalls (including buffer contents if you want that) and exists and works basically everywhere without any setup beyond installing the package.
How about write your own profiling tool using a wrapper? See GCC -wrapper:
-wrapper
Invoke all subcommands under a wrapper program. The name of the wrapper program and its parameters are passed as a comma separated list.
I am trying to measure the performance of some library calls. My primary measurement tool is the rdtsc call. After doing some reading I realize that I need to disable preemption and interrupts in order to get the most accurate readings. Can someone help me figure out how to do these? I know that pthreads have a 'set affinity' mechanism. Is that enough to get the job done?
I also read somewhere that I can make calls into the kernel of the sort
preempt_disable()
raw_local_irq_save(...)
Is there any benefit to using one approach over the other? I tried the latter approach and got this error.
error: 'preempt_disable' was not declared in this scope
which can be fixed by including linux/preempt.h but the compiler still complains.
linux/preempt.h: No such file or directory
Obviously I have not done any kernel hacking and I could not find this file on my system anywhere. I am really hoping I wont have to install a new linux kernel. :)
Thanks for your input.
Pinning a pthread to a single CPU can be done using pthread_setaffinity_np
But what you want to achieve at the end is not so simple. I'll explain you why.
preempt.h is part of the Linux Kernel source. Its located here. You need to have kernel sources with you. Anyways, you need to write a kernel module to access it, you cannot use it from user space. Learn how to write a kernel module here. Same is the case with functions preempt_disable and other interrupt disabling kernel functions
Now the point is, pthreads are in user space and your preemption disabling function is in kernel space. How to interact?
Either you need to write a new system call of your own where you do your preemption and interrupt disabling and call it from user space. Or you need to resort to other Kernel-User Space Interfaces like procfs, sysfs, ioctl etc
But I am really skeptical as to how all these will help you to benchmark library functions. You may want to have a look at how performance is typically measured using rdtsc
Is there any way to add a system call dynamic, such as through a module? I have found places where I can override an existing system call with a module by just changing the sys_call_table[] array to get my overridden function instead of the native when my module is installed, but can you do this with a new system call and a module?
No, sys_call_table is of fixed size:
const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = { ...
The best you can do, as you probably already discovered, is to intercept existing system calls.
Intercepting existing system call (to have something done in the kernel) is not the right way in some cases. For eg, if your userspace drivers need to execute something in kernel, send something there, or read something from kernel?
Usually for drivers, the right way is to use ioctl() call, which is just one system call, but it can call different kernel functions or driver modules - by passing different parameters through ioctl().
The above is for user-controlled kernel code execution.
For data passing, you can use procfs, or sysfs drivers to talk to the kernel.
PS: when you intercept system call, which generally affect the entire OS, you have to worry about how to solve the problem of doing it safely: what if someone else is halfway calling the system call, and then you modify/intercept the codes?