both strace and ftrace seem to be used for tracing function calls in Linux. What's the difference?
strace is a utility which allows you to trace the system calls that an application makes. When an application makes a system call, it is basically asking the kernel to do something, eg file access. Use the command man strace to get strace documentation and man syscalls to get information on system calls.
ftrace is a tool used during kernel development and allows the developer to see what functions are being called within the kernel. The documentation is here and states:
Ftrace is an internal tracer designed to help out developers and
designers of systems to find what is going on inside the kernel.
It can be used for debugging or analyzing latencies and
performance issues that take place outside of user-space.
Related
I have a requirement to write a linux device driver in userspace.
How can I write a library which, when linked to an application, can handle system calls to a particular device.
The application should be able to use open(), read(), write(), ioctl() on a device such as /dev/mydev0, but these calls should terminate in a userspace library instead of a kernel module.
Please advise on if this is possible and how can I achieve this.
Linux is a monolithic kernel which means in general, what you're asking is not possible; you can't write arbitrary drivers in user-mode.
You could (as your title alludes to), use ptrace(2) to trap on system calls, and basically redirect them to functions in your library. That is not a simple, straightforward solution however.
See also:
How to use ptrace(2) to change behaviour of syscalls?
FUSE (Filesystem in USErspace) may be what you're looking for. It's a mechanism that allows filesystem drivers specifically to be implemented via a user-space process. This is how sshfs, for example, is implemented.
Resources:
http://fuse.sourceforge.net/
I have read "system call APIs are for user-space access and
system APIs are for system space access". I am new to Linux OS concepts, I don't have any knowledge about the System API. Can anyone explain the difference between these two?
A system call is an explicit request to the kernel made via a software interrupt. It is the lowest level thing which talks to the Operating system. System call is when you call the kernel. System-calls are actually intended to be very low-level interfaces, you can say to a very specific functionality which your program cannot accomplish on its own.
Whereas a System API are used to invoke system call
Read system call and linux kernel wikipages first.
As Rahul Triparhi answered, system calls are the elementary operations, as seen from a user-mode application software. Use strace(1) to find out which syscalls are done by some program.
The system calls are well documented in the section 2 of the man pages (type first man man in a terminal on your Linux system). So read intro(2) and then syscalls(2).
Stricto sensu, syscalls have an interface, notably specified in ABI specifications like x86-64 ABI, defined at the lowest possible machine level - in terms of machine instructions and registers, etc... The functions in section 2 are tiny C wrappers above them. See also Linux Assembly HowTo
Please read also Advanced Linux Programming which explains quite well many of them.
BTW, I am not sure that "System API" has a well defined meaning, even if I guess what it could be. See also the several answers to this question.
Probably "System API" refers to the many functions standardized by POSIX, implemented in the POSIX C library such as GNU libc (but you could use some other libc on Linux, like MUSL libc, if you really wanted to). I am thinking of functions like dlopen (to dynamically load a plugin) or getaddrinfo(3) (to get information about some network things) etc... The Linux implementation (e.g. dlopen(3)) is providing a super-set of it.
More generally, the section 3 of man pages, see intro(3), is providing a lot of library functions (most of them built above system calls, so dlopen actually calls mmap(2) syscall, and getaddrinfo may use syscalls to connect to some server - see nsswitch.conf(5), etc...). But some library functions are probably not doing any syscall, like snprintf(3) or sqrt(3) or longjmp(3) .... (they are just doing internal computations without needing any additional kernel service).
Is it possible to trace all the system calls/Interrupt generated by KVM to interact with hardware. I know there are tools like strace which can trace all the system calls generated by any C program but how to do the same if you want to get all the system calls for a hypervisor.
User space part (e.g. qemu) is a regular process, so strace will work for syscall tracing. Kernel-space part (KVM) may be traced via SystemTap or similar tool.
I am new to linux kernel.
wandering how to browse the complete flow, right from the power up of CPU.
Basic idea on BIOS/ROM code.
can I have some tool to debug the complete kernel ?
or
raw code browsing is preferable ?
The following tools may help you to debug Linux kernel
Dynamic Probes is one of the popular debugging tool for Linux which developed by IBM. This tool allows the placement of a “probe” at almost any place in the system, in both user and kernel space. The probe consists of some code (written in a specialized, stack-oriented language) that is executed when control hits the given point. Resources regarding dprobes / kprobes listed below
http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaax/dprobesltt.pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.6212&rep=rep1&type=pdf
https://www.redhat.com/magazine/005mar05/features/kprobes/
https://sourceware.org/systemtap/kprobes/
http://www.ibm.com/developerworks/library/l-kprobes/index.html
https://doc.opensuse.org/documentation/html/openSUSE_121/opensuse-tuning/cha.tuning.kprobes.html
Linux Trace Toolkit is a kernel patch and a set of related utilities that allow the tracing of events in the kernel. The trace includes timing information and can create a reasonably complete picture of what happened over a given period of time. Resources of LTT, LTT Viewer and LTT Next Generation
http://elinux.org/Linux_Trace_Toolkit
http://www.linuxjournal.com/article/3829
http://multivax.blogspot.com/2010/11/introduction-to-linux-tracing-toolkit.html
MEMWATCH is an open source memory error detection tool. It works by defining MEMWATCH in gcc statement and by adding a header file to our code. Through this we can track memory leaks and memory corruptions. Resources regarding MEMWATCH
http://www.linuxjournal.com/article/6059
ftrace is a good tracing framework for Linux kernel. ftrace traces internal operations of the kernel. This tool included in the Linux kernel in 2.6.27. With its various tracer plugins, ftrace can be targeted at different static tracepoints, such as scheduling events, interrupts, memory-mapped I/O, CPU power state transitions, and operations related to file systems and virtualization. Also, dynamic tracking of kernel function calls is available, optionally restrictable to a subset of functions by using globs, and with the possibility to generate call graphs and provide stack usage. You can find a good tutorial of ftrace at https://events.linuxfoundation.org/slides/2010/linuxcon_japan/linuxcon_jp2010_rostedt.pdf
ltrace is a debugging utility in Linux, used to display the calls a user space application makes to shared libraries. This tool can be used to trace any dynamic library function call. It intercepts and records the dynamic library calls which are called by the executed process and the signals which are received by that process. It can also intercept and print the system calls executed by the program.
http://www.ellexus.com/getting-started-with-ltrace-how-does-it-do-that/?doing_wp_cron=1425295977.1327838897705078125000
http://developerblog.redhat.com/2014/07/10/ltrace-for-rhel-6-and-7/
KDB is the in-kernel debugger of the Linux kernel. KDB follows simplistic shell-style interface. We can use it to inspect memory, registers, process lists, dmesg, and even set breakpoints to stop in a certain location. Through KDB we can set breakpoints and execute some basic kernel run control (Although KDB is not source level debugger). Several handy resources regarding KDB
http://www.drdobbs.com/open-source/linux-kernel-debugging/184406318
http://elinux.org/KDB
http://dev.man-online.org/man1/kdb/
https://www.kernel.org/pub/linux/kernel/people/jwessel/kdb/usingKDB.html
KGDB is intended to be used as a source level debugger for the Linux kernel. It is used along with gdb to debug a Linux kernel. Two machines are required for using kgdb. One of these machines is a development machine and the other is the target machine. The kernel to be debugged runs on the target machine. The expectation is that gdb can be used to "break in" to the kernel to inspect memory, variables and look through call stack information similar to the way an application developer would use gdb to debug an application. It is possible to place breakpoints in kernel code and perform some limited execution stepping. Several handy resources regarding KGDB
http://landley.net/kdocs/Documentation/DocBook/xhtml-nochunks/kgdb.html
First, see related question Linux kernel live debugging, how it's done and what tools are used?. Try to use KDB or Ftrace.
If your intention is understanding whole flow of Linux kernel, running Linux kernel on QEMU can be easy way to learn how Linux works. Esp. you can emulate many CPU types without real H/W. or how about user mode Linux?
This document can be helpful to debug kernel on QEMU.
Just adding, the Linux kernel is not very suitable for debugging. Linus Torvalds once stated that he's againts supportng kernel debugging in Linux because it leads to badly written code.
I used kdbg, however I didn't find it very useful, what I suggest is to debug the kernel the oldschool way, using printk.
I would like to monitor the the context switching behavior in a multi-threaded pthread application.
In other RTOSes(Micro C OS) I have been able to register a context switch callback for each thread in the application, and then log (or toggle a gpio) and watch the thread context switching in real time. This was a valuable tool for debugging the real time behavior and interaction of the multiple threads.
My current environment is embedded linux utilizing the pthread api. Is there a way to monitor each of the context switches?
Not in the way you describe, however there are various profiling tools for Linux, like oprofile, SystemTap and perf events, I'm not sure how well they'd fit into embedded development though.
EDIT: perf is probably best (if you are running a recent enough kernel to use it) since it's in the mainline so you just have to turn it on, and it's really basic.
EDIT: if none of those work for you you can always modify the kernel context switch code...
EDIT: I missed one of the tracing frameworks, there is also LTTng
If you're using busybox and can compile your own kernel perf is probably the most minimal way to go, it's consists of turning on perf events in the kernel and compiling the perf tool that comes with the kernel source (it's in tools/perf)