Linux; Kernel interrupts impacts on CPU-load

Linux; Kernel interrupts impacts on CPU-load - linux

I have implementet Derek Molloys Loadable Kernel Module (see listing 4). It uses a Kernel Module to register a kernel interrupt on a GPIO rising edge. So every time there is a rising edge on a certain GPIO-pin, an interrupt (ISR) runs. The only thing happening in the interrupt, is counting up an integer. I'm running debian on the beaglebone (Linux beaglebone 3.8.13-bone47).
I put a square wave signal onto the GPIO, causing the interrupts to trigger with a certain frequency. If I turn the frequency up to somewhere above 10kHz, the processor freezes. I don't expect the the processor to be able to follow up to this pace, but I expect the load to be visible by the "top" command. Here is what I see:
This measurement is taken with 10 kHz kernel interrupts running, but i still only get:
%Cpu(s): 0.0 hi
"hi" is defined as: "time spent servicing hardware interrupts" in man top
How can that be? How can i measure the impact the kernel interrupt has on the CPU's idletime?

Related

Time is not synchronized between userspace and kernel

I'm trying to measure the lateny between kernel and userspace, by triggering periodic timer every 1 second, and notifying userspace of the event (using ioctl , and wake_up_interruptible.
For this I created a kernel module which is using hrtimer, and userspace test which is waiting for events.
the kernel module is using
in kernel: getnstimeofday(),
and to get the time, and the userspace is using:
in userspace: clock_gettime().
But the amazing thing is that I see that the results timing from userspace and kernel is not synchronized !
1st event:
userspace: 8866[sec] 896197992[nsec] ; kernel: 1388251190[sec] 442706727[nsec]
2nd event:
userspace: 8867[sec] 896151470[nsec] ; kernel: 1388251191[sec] 442690693[nsec]
As you can see kernel and userspace clock is not synchronized,
so I can't really measure latency between kernel and userspace events, Right ?
Thank you for any idea,
Ran

From your test result, it seems there is just an "offset" between this 2 APIs, but the delta of these 2 events in sec are correct (1 sec diff).And from the doc
http://lxr.free-electrons.com/source/kernel/time/timekeeping.c?v=2.6.37#L101, the kernel time API "calculate the delta since the last update_wall_time". So they are not actually having the same time system. (The user space is get the system timer).
I think another way that you can use the Linux kernel called Ftrace to measure this kind of delay.

timing mechanisms in computer systems

I've read this link on Measure time in Linux - getrusage vs clock_gettime vs clock vs gettimeofday? which provides a great breakdown of timing functions available in C
I'm very confused, however, to how the different notions of "time" are maintained by the OS/hardware.
This is a quote from the Linux man pages,
RTCs should not be confused with the system clock, which is a
software clock maintained by the kernel and used to implement
gettimeofday(2) and time(2), as well as setting timestamps on files,
and so on. The system clock reports seconds and microseconds since a
start point, defined to be the POSIX Epoch: 1970-01-01 00:00:00 +0000
(UTC). (One common implementation counts timer interrupts, once per
"jiffy", at a frequency of 100, 250, or 1000 Hz.) That is, it is
supposed to report wall clock time, which RTCs also do.
A key difference between an RTC and the system clock is that RTCs run
even when the system is in a low power state (including "off"), and
the system clock can't. Until it is initialized, the system clock
can only report time since system boot ... not since the POSIX Epoch.
So at boot time, and after resuming from a system low power state,
the system clock will often be set to the current wall clock time
using an RTC. Systems without an RTC need to set the system clock
using another clock, maybe across the network or by entering that
data manually.
The Arch Linux docs indicate that the RTC and system clock are independent after bootup. My questions then are:
What causes the interrupts that increments the system clock???
If wall time = time interval using the system clock, what does the process time depend on??
Is any of this all related to the CPU frequency? Or is that a totally orthogonal time-keeping business

On Linux, from the application point of view, the time(7) man page gives a good explanation.
Linux provides also the (linux specific) timerfd_create(2) and related syscalls.
You should not care about interrupts (they are the kernel's business, and are configured dynamically, e.g. thru application timers -timer_create(2), poll(2) and many other syscalls- and by the scheduler), but only about application visible time related syscalls.
Probably, if some process is making a timer with a tiny period of e.g. 10ms, the kernel will increase the frequency of timer interrupts to 100Hz
On recent kernels, you probably want the
CONFIG_HIGH_RES_TIMERS=y
CONFIG_TIMERFD=y
CONFIG_HPET_TIMER=y
CONFIG_PREEMPT=y
options in your kernel's .config file.
BTW, you could do cat /proc/interrupts twice with 10 seconds interval. On my laptop with a home-built 3.16 kernel -with mostly idle processes, but a firefox browser and an emacs, I'm getting 25 interrupts per second. Try also cat /proc/timer_list and cat /proc/timer_stats
Look also in the Documentation/timers/ directory of a recent (e.g. 3.16) Linux kernel tree.
The kernel probably use hardware devices like -for PC laptops and desktops- on-chip HPET (or the TSC) which are much better than the old battery saved RTC timer. Of course, details are hardware specific. So, on ARM based Linux systems (e.g. your Android smartphone) it is different.

Linux kernel has better response time under stress

I have a stange behavior that I fail to understand:
For performance measurement purpose, I'm using the 'old' parrallel port interface to generate IRQs on a debian kernel 3.2.0-4-amd64 (I am using an externel signal generator connected to tha ACK pin).
I wrote my own kernel module (top half only) to handle the interrupt and send an external signal back to the parrallel port and display both signals on a oscilloscope so I can measure the kernel response time.
Everything works as expected and I can see an average 70 µs of time response with some 'burst' of 20µs . I'm running on a "Intel(R) Core(TM) i3-3240 CPU # 3.40GHz".
Now, the "unexplained" part.
If I load the CPU, memory and I/O using the "stress" program, I expected the average time to be worst , but the opposit happens: my average response time drops to 20µs.
I tried on 3 differents kernel:
vanilla, PREEMT-RT anf vanilla with NO_HZ option set to false.
Can someone explain the magic of this ?
I change the 'governor' configuration to 'performance' but doesn't change anything.

Your interrupt handler has a higher priority than the stress program.
So the only influence the stress program has is to prevent the CPU from sleeping, which avoids the delay needed by the CPU to wake up from the sleeping state when an interrupt arrives.

About Linux NMI watchdog

Now I encounter a problem about Linux NMI Watchdog.
I want to use Linux NMI watchdog to detect and recovery OS hang. So, I add "nmi_watchdog=1" into grub.cfg. And then check the /proc/interrupt, NMI were triggered per second. But after I load a module with deadlock (double-acquire spinlock), system were hanged totally, and nothing occurs (never panic!). It looks like that NMI watchdog did not work!
Then I read the Documentation/nmi_watchdog.txt, it says:
Be aware that when using local APIC, the frequency of NMI interrupts
it generates, depends on the system load. The local APIC NMI watchdog,
lacking a better source, uses the "cycles unhalted" event.
What's the "cycles unhalted" event?
It added:
but if your system locks up on anything but the "hlt" processor
instruction, the watchdog will trigger very soon as the "cycles
unhalted" event will happen every clock tick...If it locks up on
"hlt", then you are out of luck -- the event will not happen at all
and the watchdog won't trigger.
Seems like that watchdog won't trigger if processor executes "hlt" instruction, then I search "hlt" in "Intel 64 and IA-32 Architectures Software Developer's Manual, Volumn 2A", it describes it as follow:
Stops instruction execution and places the processor in a HALT state.
An enabled interrupt (including NMI and SMI), a debug exception, the
BINIT# signal, the INIT# signal, or the RESET# signal will resume
execution.
Then I am lost...
My question is:
How does Linux NMI watchdog work?
Who trigger the NMI?
My OS is Ubuntu 10.04 LTS, Linux-2.6.32.21, CPU Pentium 4 Dual-core 3.20 GHz.
I didn't read the whole source code about NMI watchdog(no time), if I couldn't understand how NMI watchdog work, I want use performance monitoring counter interrupt and inter-processor interrupt (be provided by APIC) to send NMI instead of NMI watchdog.

The answer depends on your hardware.
Non-maskable interrupts (NMI) can be triggered 2 ways: 1) when the kernel reaches a halting state that can't be interrupted by another method, and 2) by hardware -- using an NMI button.
On the front of some Dell servers, for example, you will see a small circle with a zig-zag line inside it. This is the NMI symbol. Nearby there is a hole. Insert a pin to trigger the interrupt. If your kernel is built to support it, this will dump a kernel panic trace to the console, then reboot the system.
This can happen very fast. So if you don't have a console attached to save the output to a file, it might look like only a reboot.

As I know, nmi_watchdog would only triggered for non-interruptible hangs. I found an code example by google: http://oslearn.blogspot.in/2011/04/use-nmi-watchdog.html
If your deadlock is not non-interruptiable, you can try enable sysRq to trigger some trace (Alt-printscreen-t) or crash (Alt-printscreen-c) to get more information.

What is the acpi_pm linux clocksource used for, what hardware implements it?

The file /sys/devices/system/clocksource/clocksource0/available_clocksource in my Linux box lists the following clock sources:
tsc hpet acpi_pm
I know that tsc is the Timestamp Counter Register in the Processer.
I know the hpet is the High Precision event timer.
I do not know what the acpi_pm is and what hardware implements it? Is this the PIT (programmable interval timer)?

That's ACPI power management timer.
The ACPI Power Management Timer (or ACPI PMT) is yet another clock device included in almost all ACPI-based motherboards. Its clock signal has a fixed frequency of roughly 3.58 MHz. The device is actually a simple counter increased at each clock tick
(from Understanding the Linux Kernel)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string