Printing CPU number similar to ftrace - linux

I want to print CPU number on which the current process or function is executing similar to ftrace like this:
TASK-PID CPU# TIMESTAMP FUNCTION
| | | | |
<idle>-0 [002] 23636.756054: ttwu_do_activate.constprop.89 <-try_to_wake_up
<idle>-0 [002] 23636.756054: activate_task <-ttwu_do_activate.constprop.89
<idle>-0 [002] 23636.756055: enqueue_task <-activate_task
How do I get that value? I suppose its there in some function of start_kernel function. Can we print its value? I am using linux-4.1 kernel.

For printing current cpu in kernel, the cpu field of task_struct can be used. Note that the kernel configuration CONFIG_THREAD_INFO_IN_TASK should be enabled. This will work for 4.9 kernel.
printk("My current cpu is %d\n", current->cpu);
smp_processor_id() also can be used if cpu field is not available.

Related

The trace-cmd report do not show the function name

I used the following command to trace the kernel.
$ trace-cmd record -p function_graph ls
$ trace-cmd report
but I saw the following result just show the address instead of the function name.
MtpServer-4877 [000] 1706.014074: funcgraph_exit: + 23.875 us | }
trace-cmd-4892 [001] 1706.014075: funcgraph_entry: 2.000 us | ffff0000082cdc24();
trace-cmd-4895 [002] 1706.014076: funcgraph_entry: 1.250 us | ffff00000829e704();
MtpServer-4877 [000] 1706.014076: funcgraph_entry: 1.375 us | ffff0000083266bc();
kswapd0-1024 [003] 1706.014078: funcgraph_entry: | ffff00000827956c() {
kswapd0-1024 [003] 1706.014081: funcgraph_entry: | ffff00000827801c() {
trace-cmd-4895 [002] 1706.014081: funcgraph_entry: 1.375 us | ffff0000082bd8b4();
MtpServer-4877 [000] 1706.014082: funcgraph_entry: 1.375 us | ffff0000082ccefc();
trace-cmd-4892 [001] 1706.014082: funcgraph_entry: | ffff0000082c5adc() {
kswapd0-1024 [003] 1706.014084: funcgraph_entry: 1.500 us | ffff00000828c8f0();
trace-cmd-4892 [001] 1706.014085: funcgraph_entry: 1.250 us | ffff0000082c5a58();
MtpServer-4877 [000] 1706.014088: funcgraph_entry: 1.125 us | ffff0000082e3a30();
trace-cmd-4895 [002] 1706.014089: funcgraph_exit: + 19.125 us | }
kswapd0-1024 [003] 1706.014090: funcgraph_entry: 1.500 us | ffff0000090b6c04();
trace-cmd-4895 [002] 1706.014090: funcgraph_entry: | ffff0000082d4ffc() {
trace-cmd-4892 [001] 1706.014092: funcgraph_exit: 6.875 us | }
trace-cmd-4895 [002] 1706.014093: funcgraph_entry: 1.000 us | ffff0000090b3a40();
May I know how to show the exact function name on the trace-cmd result?
I ran into this issue, and for me it was caused by /proc/kallsyms (a file which mappings from kernel addresses to symbol names) displaying all zeros for kernel addresses.
Since this is a procfs "file" it's behavior can, and does vary depending on what context you are accessing it from.
In this case, it is due to security checks.
If you pass the checks, the addresses will be nonzero and look similar to this:
000000000000c000 A exception_stacks
0000000000014000 A entry_stack_storage
0000000000015000 A espfix_waddr
0000000000015008 A espfix_stack
If you fail the checks they will be zero, like this:
(In my case I did not have the CAP_SYSLOG capability because I was running in a container and systemd-nspawn's default behaviour was dropping the capability. [1])
0000000000000000 A exception_stacks
0000000000000000 A entry_stack_storage
0000000000000000 A espfix_waddr
0000000000000000 A espfix_stack
This is influenced by kernel settings and more information (along with the rather simple checking source code) can be found on the answers in Allow single user to access /proc/kallsyms
To fix this issue you need to start your trace-cmd with proper permissions / capabilities.
[1] This is what the capability settings look like for an nspawned process with no modifications:
# cat /proc/2894/status | grep -i cap
CapInh: 0000000000000000
CapPrm: 00000000fdecafff
CapEff: 00000000fdecafff
CapBnd: 00000000fdecafff
CapAmb: 0000000000000000
This is what it looks like for a process started by root in a "normal" context:
# cat /proc/616428/status | grep -i cap
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
Alternative solution
This is speculation on my part, but I expect this manner of censoring in the kallsyms file is to prevent defeating kASLR by just reading addresses of memory locations out of the file.
It seems that if you directly use the tracefs ftrace interfaces, as opposed to reading raw trace events and then resolving symbols (I don't know if there is a way to make trace-cmd do this), then the kernel will resolve the symbol names for you (and this way does not expose kernel addresses) and so this is not an issue.
For example (note how it says !cap_syslog):
# capsh --print
Current: =ep cap_syslog-ep
Bounding set = [...elided for brevity ...]
Ambient set =
Current IAB: !cap_syslog
Securebits: 00/0x0/1'b0 (no-new-privs=0)
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: no (unlocked)
secure-no-ambient-raise: no (unlocked)
uid=0(root) euid=0(root)
gid=0(root)
groups=0(root)
Guessed mode: HYBRID (4)
# echo 1 > events/enable
# echo function_graph > current_tracer
# echo 1 > tracing_on
# head trace
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
2) 0.276 us | } /* fpregs_assert_state_consistent */
1) 2.052 us | } /* exit_to_user_mode_prepare */
1) | syscall_trace_enter.constprop.0() {
1) | __traceiter_sys_enter() {
1) | /* sys_futex(uaddr: 7fc3565b9378, op: 80, val: 0, utime: 0, uaddr2: 0, val3: 0) */
1) | /* sys_enter: NR 202 (7fc3565b9378, 80, 0, 0, 0, 0) */

harmful printk effect in linux kernel module

Here is a part of my code.
for (i=0;i<29;i++)
{
*reg0= (unsigned int)(src[i][0]<<24 | (src[i][1]<<16) | (src[i][2]<<8) | src[i][3]);
printk("");
*reg4= (unsigned int)(src[i][4]<<24 | (src[i][5]<<16) | (src[i][6]<<8) | src[i][7]);
printk("");
*reg8= (unsigned int)(src[i][8]<<24 | (src[i][9]<<16) | (src[i][10]<<8) | src[i][11]);
printk("");
*reg12= (unsigned int)(src[i][12]<<24 | (src[i][13]<<16) | (src[i][14]<<8) | 0);
printk("");
}
it is a for loop used in a module built under petalinux. it works very well after doing insmod and execution the user application. But when I comment printk("") the module doesn't work any more and it seems like it gets lost in recursion. I have to suppress printk becase it takes much time execution.
Any help would be appreciated.
regards.

perf : How to check processess running on particular cpu

Is there any option in perf to look into processes running on a particular cpu /core, and how much percentage of that core is taken by each process.
Reference links would be helpful.
perf is intended to do a profiling which is not good fit for your case. You may try to do sampling /proc/sched_debug (if it is compiled in your kernel). For example you may check which process is currently running on CPU:
egrep '^R|cpu#' /proc/sched_debug
cpu#0, 917.276 MHz
R egrep 2614 37730.177313 ...
cpu#1, 917.276 MHz
R bash 2023 218715.010833 ...
By using his PID as a key, you may check how many CPU time in milliseconds it consumed:
grep se.sum_exec_runtime /proc/2023/sched
se.sum_exec_runtime : 279346.058986
However, as #BrenoLeitão mentioned, SystemTap is quite useful for your script. Here is script for your task.
global cputimes;
global cmdline;
global oncpu;
global NS_PER_SEC = 1000000000;
probe scheduler.cpu_on {
oncpu[pid()] = local_clock_ns();
}
probe scheduler.cpu_off {
if(oncpu[pid()] == 0)
next;
cmdline[pid()] = cmdline_str();
cputimes[pid(), cpu()] <<< local_clock_ns() - oncpu[pid()];
delete oncpu[pid()];
}
probe timer.s(1) {
printf("%6s %3s %6s %s\n", "PID", "CPU", "PCT", "CMDLINE");
foreach([pid+, cpu] in cputimes) {
cpupct = #sum(cputimes[pid, cpu]) * 10000 / NS_PER_SEC;
printf("%6d %3d %3d.%02d %s\n", pid, cpu,
cpupct / 100, cpupct % 100, cmdline[pid]);
}
delete cputimes;
}
It traces moments when process is running on CPU and stops execution on that (due to migration or sleeping) by attaching to scheduler.cpu_on and scheduler.cpu_off probes. Second probe calculates time difference between these events and saves it to cputimes aggregation along with process command line arguments.
timer.s(1) fires once per second -- it walks over aggregation and calculates percentage. Here is sample output for Centos 7 with bash running infinite loop:
0 0 100.16
30 1 0.00
51 0 0.00
380 0 0.02 /usr/bin/python -Es /usr/sbin/tuned -l -P
2016 0 0.08 sshd: root#pts/0 "" "" "" ""
2023 1 100.11 -bash
2630 0 0.04 /usr/libexec/systemtap/stapio -R stap_3020c9e7ba76838179be68cd2390a10c_2630 -F3
I understand that perf is not the proper way to do it, although you can limit perf per CPU, as using perf record -C <cpulist> or even perf stat -c <cpulist>.
The close you are going to see is the context-switch event, but, this is not going to provide you the application names at all.
I think you are going to need something more powerful, as systemtap.

Child hangs if parent crashes or exits in google_breakpad::ExceptionHandler::SignalHandler

This happens if parent crashes after cloning child process, but before sending the unblocking byte with SendContinueSignalToChild(). In this case pipe file handle remains opened and child stays infinitely blocked on read(...) within WaitForContinueSignal(). After the crash, child is adopted by init process.
Steps to reproduce:
l. Simulate parent crash in google_breakpad::ExceptionHandler::GenerateDump(CrashContext *context):
...
const pid_t child = sys_clone(
ThreadEntry, stack, CLONE_FILES | CLONE_FS | CLONE_UNTRACED, &thread_arg, NULL, NULL, NULL);
int r, status;
// Allow the child to ptrace us
sys_prctl(PR_SET_PTRACER, child, 0, 0, 0);
int *ptr = 0;
*ptr = 42; // <------- Crash here
SendContinueSignalToChild();
...
Send one of the handled signal to the parent (e.g. SIGSEGV), so that the above GenerateDump(...) method is envoked.
Observe that parent exits but child still exists, blocked on WaitForContinueSignal().
Output for the above steps:
dmytro#db:~$ ./test &
[1] 25050
dmytro#db:~$ Test: started
dmytro#db:~$ ps aflxw | grep test
0 1000 25050 18923 20 0 40712 2680 - R pts/37 0:13 | | \_ ./test
0 1000 25054 18923 20 0 6136 856 pipe_w S+ pts/37 0:00 | | \_ grep --color=auto test
dmytro#db:~$ kill -11 25050
[1]+ Segmentation fault (core dumped) ./test
dmytro#db:~$ ps aflxw | grep test
0 1000 25058 18923 20 0 6136 852 pipe_w S+ pts/37 0:00 | | \_ grep --color=auto test
1 1000 25055 1687 20 0 40732 356 pipe_w S pts/37 0:00 \_ ./test
1687 is the init pid.
In the real world the crash happens in a thread parallel to the one that handles signal.
NOTE: the issue can also happen because of normal program termination (i.e. exit(0) is called in a parallel thread).
Tested on Linux 3.3.8-2.2., mips and i686 platforms.
So, my 2 questions:
Is it the expected behavior for the breakpad library to keep child alive? My expectation is that child should exit immediately after parent crashes/exits.
If it is not expected behavior, what is the best solution to finish client after parent crash/exit?
Thanks in advance!
Any clue on possible solution?
This can also happen during shutdown crash, if the crashed thread is not main, and the parent process exits from main() in exactly this time slot, so apparently it's not that unlikely to happen as it seems at a first glance.
At this moment, I think this is happening because of CLONE_FILES flag of clone() function. This leads to the situation where read() on pipe in child is not returning EOF if parent process quits.
I have not yet done the examination if we can safely get rid of this flag in clone() call.

Tool to trace local function calls in Linux

I am looking for a tool like ltrace or strace that can trace locally defined functions in an executable. ltrace only traces dynamic library calls and strace only traces system calls. For example, given the following C program:
#include <stdio.h>
int triple ( int x )
{
return 3 * x;
}
int main (void)
{
printf("%d\n", triple(10));
return 0;
}
Running the program with ltrace will show the call to printf since that is a standard library function (which is a dynamic library on my system) and strace will show all the system calls from the startup code, the system calls used to implement printf, and the shutdown code, but I want something that will show me that the function triple was called. Assuming that the local functions have not been inlined by an optimizing compiler and that the binary has not been stripped (symbols removed), is there a tool that can do this?
Edit
A couple of clarifications:
It is okay if the tool also provides trace information for non-local functions.
I don't want to have to recompile the program(s) with support for specific tools, the symbol information in the executable should be enough.
I would be really nice if I could use the tool to attach to existing processes like I can with ltrace/strace.
Assuming you only want to be notified for specific functions, you can do it like this:
compile with debug informations (as you already have symbol informations, you probably also have enough debugs in)
given
#include <iostream>
int fac(int n) {
if(n == 0)
return 1;
return n * fac(n-1);
}
int main()
{
for(int i=0;i<4;i++)
std::cout << fac(i) << std::endl;
}
Use gdb to trace:
[js#HOST2 cpp]$ g++ -g3 test.cpp
[js#HOST2 cpp]$ gdb ./a.out
(gdb) b fac
Breakpoint 1 at 0x804866a: file test.cpp, line 4.
(gdb) commands 1
Type commands for when breakpoint 1 is hit, one per line.
End with a line saying just "end".
>silent
>bt 1
>c
>end
(gdb) run
Starting program: /home/js/cpp/a.out
#0 fac (n=0) at test.cpp:4
1
#0 fac (n=1) at test.cpp:4
#0 fac (n=0) at test.cpp:4
1
#0 fac (n=2) at test.cpp:4
#0 fac (n=1) at test.cpp:4
#0 fac (n=0) at test.cpp:4
2
#0 fac (n=3) at test.cpp:4
#0 fac (n=2) at test.cpp:4
#0 fac (n=1) at test.cpp:4
#0 fac (n=0) at test.cpp:4
6
Program exited normally.
(gdb)
Here is what i do to collect all function's addresses:
tmp=$(mktemp)
readelf -s ./a.out | gawk '
{
if($4 == "FUNC" && $2 != 0) {
print "# code for " $NF;
print "b *0x" $2;
print "commands";
print "silent";
print "bt 1";
print "c";
print "end";
print "";
}
}' > $tmp;
gdb --command=$tmp ./a.out;
rm -f $tmp
Note that instead of just printing the current frame(bt 1), you can do anything you like, printing the value of some global, executing some shell command or mailing something if it hits the fatal_bomb_exploded function :) Sadly, gcc outputs some "Current Language changed" messages in between. But that's easily grepped out. No big deal.
System Tap can be used on a modern Linux box (Fedora 10, RHEL 5, etc.).
First download the para-callgraph.stp script.
Then run:
$ sudo stap para-callgraph.stp 'process("/bin/ls").function("*")' -c /bin/ls
0 ls(12631):->main argc=0x1 argv=0x7fff1ec3b038
276 ls(12631): ->human_options spec=0x0 opts=0x61a28c block_size=0x61a290
365 ls(12631): <-human_options return=0x0
496 ls(12631): ->clone_quoting_options o=0x0
657 ls(12631): ->xmemdup p=0x61a600 s=0x28
815 ls(12631): ->xmalloc n=0x28
908 ls(12631): <-xmalloc return=0x1efe540
950 ls(12631): <-xmemdup return=0x1efe540
990 ls(12631): <-clone_quoting_options return=0x1efe540
1030 ls(12631): ->get_quoting_style o=0x1efe540
See also: Observe, systemtap and oprofile updates
Using Uprobes (since Linux 3.5)
Assuming you wanted to trace all functions in ~/Desktop/datalog-2.2/datalog when calling it with the parameters -l ~/Desktop/datalog-2.2/add.lua ~/Desktop/datalog-2.2/test.dl
cd /usr/src/linux-`uname -r`/tools/perf
for i in `./perf probe -F -x ~/Desktop/datalog-2.2/datalog`; do sudo ./perf probe -x ~/Desktop/datalog-2.2/datalog $i; done
sudo ./perf record -agR $(for j in $(sudo ./perf probe -l | cut -d' ' -f3); do echo "-e $j"; done) ~/Desktop/datalog-2.2/datalog -l ~/Desktop/datalog-2.2/add.lua ~/Desktop/datalog-2.2/test.dl
sudo ./perf report -G
Assuming you can re-compile (no source change required) the code you want to trace with the gcc option -finstrument-functions, you can use etrace to get the function call graph.
Here is what the output looks like:
\-- main
| \-- Crumble_make_apple_crumble
| | \-- Crumble_buy_stuff
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | \-- Crumble_prepare_apples
| | | \-- Crumble_skin_and_dice
| | \-- Crumble_mix
| | \-- Crumble_finalize
| | | \-- Crumble_put
| | | \-- Crumble_put
| | \-- Crumble_cook
| | | \-- Crumble_put
| | | \-- Crumble_bake
On Solaris, truss (strace equivalent) has the ability to filter the library to be traced. I'm was surprised when I discovered strace doesn't have such a capability.
KcacheGrind
https://kcachegrind.github.io/html/Home.html
Test program:
int f2(int i) { return i + 2; }
int f1(int i) { return f2(2) + i + 1; }
int f0(int i) { return f1(1) + f2(2); }
int pointed(int i) { return i; }
int not_called(int i) { return 0; }
int main(int argc, char **argv) {
int (*f)(int);
f0(1);
f1(1);
f = pointed;
if (argc == 1)
f(1);
if (argc == 2)
not_called(1);
return 0;
}
Usage:
sudo apt-get install -y kcachegrind valgrind
# Compile the program as usual, no special flags.
gcc -ggdb3 -O0 -o main -std=c99 main.c
# Generate a callgrind.out.<PID> file.
valgrind --tool=callgrind ./main
# Open a GUI tool to visualize callgrind data.
kcachegrind callgrind.out.1234
You are now left inside an awesome GUI program that contains a lot of interesting performance data.
On the bottom right, select the "Call graph" tab. This shows an interactive call graph that correlates to performance metrics in other windows as you click the functions.
To export the graph, right click it and select "Export Graph". The exported PNG looks like this:
From that we can see that:
the root node is _start, which is the actual ELF entry point, and contains glibc initialization boilerplate
f0, f1 and f2 are called as expected from one another
pointed is also shown, even though we called it with a function pointer. It might not have been called if we had passed a command line argument.
not_called is not shown because it didn't get called in the run, because we didn't pass an extra command line argument.
The cool thing about valgrind is that it does not require any special compilation options.
Therefore, you could use it even if you don't have the source code, only the executable.
valgrind manages to do that by running your code through a lightweight "virtual machine".
Tested on Ubuntu 18.04.
$ sudo yum install frysk
$ ftrace -sym:'*' -- ./a.out
More: ftrace.1
If you externalize that function into an external library, you should also be able to see it getting called, ( with ltrace ).
The reason this works is because ltrace puts itself between your app and the library, and when all the code is internalized with the one file it can't intercept the call.
ie: ltrace xterm
spews stuff from X libraries, and X is hardly system.
Outside this, the only real way to do it is compile-time intercept via prof flags or debug symbols.
I just ran over this app, which looks interesting:
http://www.gnu.org/software/cflow/
But I dont think thats what you want.
If the functions aren't inlined, you might even have luck using objdump -d <program>.
For an example, let's take a loot at the beginning of GCC 4.3.2's main routine:
$ objdump `which gcc` -d | grep '\(call\|main\)'
08053270 <main>:
8053270: 8d 4c 24 04 lea 0x4(%esp),%ecx
--
8053299: 89 1c 24 mov %ebx,(%esp)
805329c: e8 8f 60 ff ff call 8049330 <strlen#plt>
80532a1: 8d 04 03 lea (%ebx,%eax,1),%eax
--
80532cf: 89 04 24 mov %eax,(%esp)
80532d2: e8 b9 c9 00 00 call 805fc90 <xmalloc_set_program_name>
80532d7: 8b 5d 9c mov 0xffffff9c(%ebp),%ebx
--
80532e4: 89 04 24 mov %eax,(%esp)
80532e7: e8 b4 a7 00 00 call 805daa0 <expandargv>
80532ec: 8b 55 9c mov 0xffffff9c(%ebp),%edx
--
8053302: 89 0c 24 mov %ecx,(%esp)
8053305: e8 d6 2a 00 00 call 8055de0 <prune_options>
805330a: e8 71 ac 00 00 call 805df80 <unlock_std_streams>
805330f: e8 4c 2f 00 00 call 8056260 <gcc_init_libintl>
8053314: c7 44 24 04 01 00 00 movl $0x1,0x4(%esp)
--
805331c: c7 04 24 02 00 00 00 movl $0x2,(%esp)
8053323: e8 78 5e ff ff call 80491a0 <signal#plt>
8053328: 83 e8 01 sub $0x1,%eax
It takes a bit of effort to wade through all of the assembler, but you can see all possible calls from a given function. It's not as easy to use as gprof or some of the other utilities mentioned, but it has several distinct advantages:
You generally don't need to recompile an application to use it
It shows all possible function calls, whereas something like gprof will only show the executed function calls.
There is a shell script for automatizating tracing function calls with gdb. But it can't attach to running process.
blog.superadditive.com/2007/12/01/call-graphs-using-the-gnu-project-debugger/
Copy of the page - http://web.archive.org/web/20090317091725/http://blog.superadditive.com/2007/12/01/call-graphs-using-the-gnu-project-debugger/
Copy of the tool - callgraph.tar.gz
http://web.archive.org/web/20090317091725/http://superadditive.com/software/callgraph.tar.gz
It dumps all functions from program and generate a gdb command file with breakpoints on each function. At each breakpoint, "backtrace 2" and "continue" are executed.
This script is rather slow on big porject (~ thousands of functions), so i add a filter on function list (via egrep). It was very easy, and I use this script almost evry day.
Gprof might be what you want
See traces, a tracing framework for Linux C/C++ applications:
https://github.com/baruch/traces#readme
It requires recompiling your code with its instrumentor, but will provide a listing of all functions, their parameters and return values. There's an interactive to allow easy navigation of large data samples.
Hopefully the callgrind or cachegrind tools for Valgrind will give you the information you seek.
NOTE: This is not the linux kernel based ftrace, but rather a tool I recently designed to accomplish local function tracing and control flow. Linux ELF x86_64/x86_32 are supported publicly.
https://github.com/leviathansecurity/ftrace

Resources