Performance Evaluation of Linux Scheduler - linux

I have done some simple changes to the scheduler in the Linux Kernel. Now, I would
like to see how those changes affect the response time of the system; in other words,
I would like to know how long a context switch takes with my modifications compared to the original scheduler. A straightforward approach would be to use the time stamp counter, and use then the printk to output the time it took for the context switch; obviously, in this case a lot of information is printed out. So I wonder if there is any other, better approach to measure the Linux scheduler response time?
Thanks

There are several kernel-level trace frameworks, which might help you. See the Kernel Trace Systems page on eLinux.org for a nice overview of the available options.

Related

Profiling methods for highly time sensitive applications

I am working in an embedded Linux environment debugging a highly timing sensitive issue related to the pairing/binding of Zigbee devices.
Our architecture is such that data read from Zigbee Front End Module via SPI interface and then passed from Kernel space to user space for processing. The processed data and response is then passed back to kernel space and clocked out over the SPI interface again.
The Zigbee 802.15.4 timing requirements specifies that we need to respond within 19.5ms and we frequently have situations where we respond just outside of this window which results in a failure and packet loss on the network.
The Linux kernel is not running with pre-emption enabled and it may not be possible to enable preemption either.
My suspicion is that since the kernel is not preemptible there is another task/process which is using the ioctl() interface and this holds off the Zigbee application just long enough that the 19.5ms window is exceeded.
I have tried the following tools
oprofile - not much help here since it profiles the entire system and the application is not actually very busy during this time since it moves such small amounts of data
strace - too much overhead, I don't have much experience using it though so maybe the output can be refined. The overhead affects the performance so much that the application does not funciton at all
Are there any other lightweight methods of profiling a system like this?
Is there anyway to catch when an ioctl call is pended on another task/thread? (assuming this is the root cause of the issue)
Good question.
Here's an idea. Don't think of it as profiling.
Think of catching it in the act.
I would investigate creating a watchdog timer to go off after the 16.5ms interval.
Whenever you are successful, reset the timer.
That way, it will only go off when there's a failure.
At that point, I would try to take a stack sample of the process, or possibly another process that might be blocking it.
That's an adaptation of this technique.
It will take some work, but I'd be surprised if there's any tool that will tell you exactly what's going on, short of an in-circuit-emulator.
LTTng is the tool you are looking for. Like Oprofile, it profiles the entire system, but you will be able to see exactly what is going on with each process and kernel thread, in a timeline fashion. You will be able to view the interaction of the threads and scheduler around the point of interest, that is, when you miss your Zigbee deadline. You may have to get clever and use some method of triggering (or more likely, stopping) the LTTng trace once you've detected the missed packet, or you might get lucky and catch it right away just using the command line tools to start and stop tracing.
You may have to do some work to get there, for example you'll have to invest some time and energy in 1) enabling your kernel to run LTTng if it doesn't have it already, and 2) learning how to use it. It is a powerful tool, and useful for a variety of profiling and analysis tasks. Most commercial embedded Linux vendors have complete end-to-end LTTng products and configuration if you have that option. If not, you should be able to find plenty of useful help and examples on line. LTTng has been around for a very long time! Happy hunting!

Linux' hrtimer - microsecond precision?

Is it possible to execute tasks on a Linux host with microsecond precision? I.e., I'd like to execute a task at a specific instant of time. I know, Linux is no real-time system but I'm searching for the best solution on Linux.
So far, I've created a kernel module, setup hrtimer and measured the jitter when the callback function is entered (I don't really care too much about the actual delay, it's jitter that counts) - it's about 20-50us. That's not significantly better than using timerfd in userspace (also tried using real-time priority for the process but that did not really change anything).
I'm running Linux 3.5.0 (just an example, tried different kernels from 2.6.35 to 3.7), /proc/timer_list shows hrtimer_interrupt, I'm not running in failsafe mode which disables hrtimer functionality. Tried on different CPUs (Intel Atom to Core i7).
My best idea so far would be using hrtimer in combination with ndelay/udelay. Is this really the best way to do it? I can't believe it's not possible to trigger a task with microsecond precision. Running the code in kernel space as module is acceptable, would be great if the code was not interrupted by other tasks though. I dont' really care too much about the rest of the system, the task will be executed only very few times a second so using mdelay/ndelay for burning the CPU for some microseconds every time the task should be executed would not really matter. Altough, I'd prefer a more elegent solution.
I hope the question is clear, found a lot of topics concerning timer precision but no real answer to that problem.
You can do what you want from user space
use clock_gettime() with CLOCK_REALTIME to get the time-of-day with nano-second resolution
use nanosleep() to yield the CPU until you are close to the time you need to execute your task (it is at least milli-second resolution).
use a spin loop with clock_gettime() until you reach the desired time
execute your task
The clock_gettime() function is implemented as a VDSO in recent kernels and modern x86 processors - it takes 20-30 nanoseconds to get the time-of-day with nano-second resolution - you should be able to call clock_gettime() over 30 times per micro-second. Using this method your task should dispatch within 1/30th of a micro-second of the intended time.
The default Linux kernel timer ticks each millisecond. Microseconds is way beyond anything current user hardware is capable of.
The jitter you see is due to a host of factors, like interrupt handling and servicing higher priority tasks. You can cut that down somewhat by selecting hardware carefully, only enabling what is really needed. The real-time patchseries to the kernel (see the HOWTO) might be an option to reduce it a bit further.
Always keep in mind that any gain has a definite cost in terms of interactiveness, stability, and (last, but by far not least) your time in building, tuning, troubleshooting, and keeping the house of cards from falling apart.

Record thread events

Suppose I need to peek on a thread's state at regular intervals and record its state along the whole execution of a program. I wouldn't know how to start thinking about this. Any pointers (pun?)? I'm on Linux, using gcc, phreads and C and have access to all usual Linux tools. Basically, I guess I'm asking about how to build a simple profiler for threads that will tell me how long a thread has been in some or other state during the execution of the program.
I want to be able to create graphs like Threadscope does. The X axis is time, the Y axis is core/thread number and the "colors" are state: green means running, orange is garbage collection, and so on. Does this make more sense now?
.
For Linux specific solution, you might like to have a look at /proc/<pid>/stat and /proc/<pid>/task/<tid>/stat for process and thread statistics, respectively. Have a look at proc(5) manual page for full description of all the fields there (online http://man7.org/linux/man-pages/man5/proc.5.html - search for /proc/[pid]/stat). Specifically, at least the fields cutime and stime are of interests to you. These are monotonically increasing times, so you need to remember the previously measured value to be able to produce the time spent in the process/thread during the given time slice, in order to produce the data for your graphs. (This is how top(1) works.)
However, for the profiler to distinguish different states makes the problem more complicated. How do the profiler distinguish that the profiled program is in which state? It seems to me the profiled program threads need to signal this in some way to the profiler. You need to have some kind of tailored solution for this state sharing (unless you can run the different states in different threads and make the distinction this way, which I doubt).
If the state transitions are done in single place (e.g. enter GC and leave GC in your example), then one way would be as follows:
The monitored threads would get the start and end times of the special states by using POSIX function clock_gettime() - with clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &tp) you can get the process time and with clock_gettime(CLOCK_THREAD_CPUTIME_ID, &tp) you can get the thread time (both monotonically increasing, again).
The thread could communicate these timings to the profiler program with some kind of IPC.
If the profiler application knows the thread times of entering and leaving a state, then because it knows the thread time values at the change of measuring slices, it can determine how much of the thread time is spent in the reported states within a reporting time slice (and of course here we need to adjust the start time for a state to equal the start of the next reporting time slice).
The time the whole process has spent on a specific state can be calculated by summing up the thread times for that state.
Note that through /proc/<pid>/stat or /proc/<pid>/task/<tid>/stat, the measurement accuracy is not very good (clock ticks, often units of 10ms), but I do not know other way of getting timing information from outside of the process/thread. The function clock_gettime() gives very accurate times (nominally nanosecond accuracy, but note that at least in some MIPS and ARM systems the accuracy is as bad as with the stat files under /proc due to unexisting implementation of accurate timer reading for these fields within Linux kernel). You also would need to do some experimentation to make sure these two timing sources really would give the same results (by reading both values from the same threads). You can of course use these /proc/.../stat files inside the thread, but the accuracy just is not very good unless you spend a lot of time within a state.
Well, the direct match to profiling info produced by the haskell compiler and processed by Threadscope is, using C and GCC, the gprof utility (it's part of the GNU binutils).
For it to work correctly with pthreads you need each thread to trigger some timer initialization function. This can be done without modifying your code with this pthreads wrapper library: http://sam.zoy.org/writings/programming/gprof.html . I haven't dealt with the problem recently, it may be that something has changed and the wrapper isn't needed anymore...
As to GUI to interpret the profiling results, there is kprof (http://kprof.sourceforge.net). Unfortunately, AFAIK it doesn't produce thread duration graphs, for that you'll have to work your own solution with the textual info produced by gprof.
If you are not picky about using the "standard" solution offered by the GCC, you may wanna try this: http://code.google.com/p/gperftools/?redir=1 (didn't try it personally, but heard good opinions).
Good luck!
Take a look at at Intel VTune Amplifier XE (formerly … Intel Thread Profiler) to see if it will meet your needs.
This and other Intel Linux development tools are available free for non-commercial use.
In the video Using the Timeline in Intel VTune Amplifier XE showing a timeline of a multi-threaded application, at 9:20 the presenter mentions
"...with the frame API you can programmatically mark certain events or phases in your code. And these marks will appear on the timeline."
I think it will be rather difficult build a simple profiler simply because there are many different factors that you have to consider and system profiling is an inherently complex task, made all the more so when you are profiling a multithreaded application. The best advice I can think of is to look at something that already exists, for example OProfile.
One advantage of OProfile is that it is open source so the source code is available. But beyond this I suspect that asking how to build a profiling application might be beyond the scope of what someone can answer in a SO question, which might be why this question hasn't gotten very many responses. Hopefully looking at some example will help you get started and then perhaps if you have more focused questions you could get some more detailed responses.

Is it possible to know how much CPU usage the current thread is getting?

I am writing a task scheduler for offloaded tasks in a game engine, and I want it to tune itself based on a few heuristics.
Is it possible to know for how long the current thread has been executed between two points in time? I want to time how long tasks take to execute, and I would like that time to exclude thread switching for multiple reasons (its a more accurate measurement, plus it would be useful to know how much my threads are being switched out).
I would like a solution for linux but a windows solution would also be appreciated
Try to look at /proc/[pid]/task/[tid]/stat
Format is similar to /proc/[pid]/stat and explained here: http://www.kernel.org/doc/man-pages/online/pages/man5/proc.5.html
Example code from pthread_getcpuclockid() man page indicates that you could use clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts).

Find out how much time a process is blocked waiting for I/O on Linux

Is there a vmstat type command that works per-process that allows you to see how much time a process is blocked waiting for I/O, time in kernel and user code?
blktrace is what your looking for, block layer information, wait/blocking/busy etc..., very in depth, there are quite a few of packages that derive from it, seekwatcher, ...
Some other tool, simular to what sigjuice said, iotop, is also handy, but less informative for serious analysis. Also I believe btrace/blktrace is much more well suited for I/O tracing than oprofile, which is more general and increases load in comparison.
strace will show you how much time is spent in system call, however it won't tell you how much of this time is spent in waiting versus time really spent for I/O. you can select which system call you want to trace, or which type, it is quite powerful
latencytop would be another good tool, because your process can be waiting for I/O because of other processes, or because of some journalling daemon.
Also look at pidstat -d. It lets you see how much each process is reading and writing.
top (1) will show this imformation. You can specify an individual pid with -p

Resources