Time is not synchronized between userspace and kernel - linux

I'm trying to measure the lateny between kernel and userspace, by triggering periodic timer every 1 second, and notifying userspace of the event (using ioctl , and wake_up_interruptible.
For this I created a kernel module which is using hrtimer, and userspace test which is waiting for events.
the kernel module is using
in kernel: getnstimeofday(),
and to get the time, and the userspace is using:
in userspace: clock_gettime().
But the amazing thing is that I see that the results timing from userspace and kernel is not synchronized !
1st event:
userspace: 8866[sec] 896197992[nsec] ; kernel: 1388251190[sec] 442706727[nsec]
2nd event:
userspace: 8867[sec] 896151470[nsec] ; kernel: 1388251191[sec] 442690693[nsec]
As you can see kernel and userspace clock is not synchronized,
so I can't really measure latency between kernel and userspace events, Right ?
Thank you for any idea,
Ran

From your test result, it seems there is just an "offset" between this 2 APIs, but the delta of these 2 events in sec are correct (1 sec diff).And from the doc
http://lxr.free-electrons.com/source/kernel/time/timekeeping.c?v=2.6.37#L101, the kernel time API "calculate the delta since the last update_wall_time". So they are not actually having the same time system. (The user space is get the system timer).
I think another way that you can use the Linux kernel called Ftrace to measure this kind of delay.

Related

Linux; Kernel interrupts impacts on CPU-load

I have implementet Derek Molloys Loadable Kernel Module (see listing 4). It uses a Kernel Module to register a kernel interrupt on a GPIO rising edge. So every time there is a rising edge on a certain GPIO-pin, an interrupt (ISR) runs. The only thing happening in the interrupt, is counting up an integer. I'm running debian on the beaglebone (Linux beaglebone 3.8.13-bone47).
I put a square wave signal onto the GPIO, causing the interrupts to trigger with a certain frequency. If I turn the frequency up to somewhere above 10kHz, the processor freezes. I don't expect the the processor to be able to follow up to this pace, but I expect the load to be visible by the "top" command. Here is what I see:
This measurement is taken with 10 kHz kernel interrupts running, but i still only get:
%Cpu(s): 0.0 hi
"hi" is defined as: "time spent servicing hardware interrupts" in man top
How can that be? How can i measure the impact the kernel interrupt has on the CPU's idletime?

Handling IRQ delay in linux driver

I've build a linux driver for an SPI device.
The SPI device sends an IRQ to the processor when a new data is ready to be read.
The IRQ fires about every 3 ms, then the driver goes to read 2 bytes with SPI.
The problem I have is that sometimes, there's more than 6 ms between the IRQ has been fired and the moment where SPI transfer starts, which means I lost 2 bytes of the SPI device.
In addition, there's a uncertain delay between the 2 bytes; sometime it's close to 0, sometime it's up to 300us..
Then my question is : how can I reduce the latency between IRQ and SPI readings ?
And how to avoid latency between the 2 bytes ?
I've tried compiling the kernel with premptive option, it does not change things that much.
As for the hardware, I'm using a mini2440 board running at 400 MHz, using a hardware SPI port (not i/o simulated SPI).
Thanks for help.
BR,
Vincent.
From the brochure of the Samsung S3C2440A CPU, the SPI interface hardware supports both interrupt and DMA-based operation. A look at the actual datasheet reveals that the hardware also supports a polling mode.
If you want to achieve high data rates reliably, the DMA-based approach is what you need. Once a DMA operation is configured, the hardware will move the data to RAM on its own, without the need for low-latency interrupt handling.
That said, I do not know the state of the Linux SPI drivers for your CPU. It could be a matter of missing support for DMA, of specific system settings or even of how you are using the driver from your own code. The details w.r.t. SPI are often highly dependent on the particular implementation...
I had a similar problem: I basically got an IRQ and needed to drain a queue via SPI in less than 10 ms or the chip would start to drop data. With high system load (ssh login was actually enough) sometimes the delay between the IRQ handler enqueueing the next SPI transfer with spi_async and the SPI transfer actually happening exceeded 11 ms.
The solution I found was the rt flag in struct spi_device (see here). Enabling that will set the thread that controls the SPI to real-time priority, which made the timing of all SPI transfers super reliable. And by the way that change also removes delay before the complete callback.
Just as a heads up, I think this was not available in earlier kernel versions.
The thing is Linux SPI stack uses queues for transmitting the messages.
This means that there is no guarantee about the delay between the moment you ask to send the SPI message, and the moment where it is effectively sent.
Finally, to fullfill my 3ms requirements between each SPI message, I had to stop using Linux SPI stack, and directly write into the CPU's register inside my own IRQ.
That's highly dirty, but it's the only way to make it work with small delays.

Linux kernel has better response time under stress

I have a stange behavior that I fail to understand:
For performance measurement purpose, I'm using the 'old' parrallel port interface to generate IRQs on a debian kernel 3.2.0-4-amd64 (I am using an externel signal generator connected to tha ACK pin).
I wrote my own kernel module (top half only) to handle the interrupt and send an external signal back to the parrallel port and display both signals on a oscilloscope so I can measure the kernel response time.
Everything works as expected and I can see an average 70 µs of time response with some 'burst' of 20µs . I'm running on a "Intel(R) Core(TM) i3-3240 CPU # 3.40GHz".
Now, the "unexplained" part.
If I load the CPU, memory and I/O using the "stress" program, I expected the average time to be worst , but the opposit happens: my average response time drops to 20µs.
I tried on 3 differents kernel:
vanilla, PREEMT-RT anf vanilla with NO_HZ option set to false.
Can someone explain the magic of this ?
I change the 'governor' configuration to 'performance' but doesn't change anything.
Your interrupt handler has a higher priority than the stress program.
So the only influence the stress program has is to prevent the CPU from sleeping, which avoids the delay needed by the CPU to wake up from the sleeping state when an interrupt arrives.

About Linux NMI watchdog

Now I encounter a problem about Linux NMI Watchdog.
I want to use Linux NMI watchdog to detect and recovery OS hang. So, I add "nmi_watchdog=1" into grub.cfg. And then check the /proc/interrupt, NMI were triggered per second. But after I load a module with deadlock (double-acquire spinlock), system were hanged totally, and nothing occurs (never panic!). It looks like that NMI watchdog did not work!
Then I read the Documentation/nmi_watchdog.txt, it says:
Be aware that when using local APIC, the frequency of NMI interrupts
it generates, depends on the system load. The local APIC NMI watchdog,
lacking a better source, uses the "cycles unhalted" event.
What's the "cycles unhalted" event?
It added:
but if your system locks up on anything but the "hlt" processor
instruction, the watchdog will trigger very soon as the "cycles
unhalted" event will happen every clock tick...If it locks up on
"hlt", then you are out of luck -- the event will not happen at all
and the watchdog won't trigger.
Seems like that watchdog won't trigger if processor executes "hlt" instruction, then I search "hlt" in "Intel 64 and IA-32 Architectures Software Developer's Manual, Volumn 2A", it describes it as follow:
Stops instruction execution and places the processor in a HALT state.
An enabled interrupt (including NMI and SMI), a debug exception, the
BINIT# signal, the INIT# signal, or the RESET# signal will resume
execution.
Then I am lost...
My question is:
How does Linux NMI watchdog work?
Who trigger the NMI?
My OS is Ubuntu 10.04 LTS, Linux-2.6.32.21, CPU Pentium 4 Dual-core 3.20 GHz.
I didn't read the whole source code about NMI watchdog(no time), if I couldn't understand how NMI watchdog work, I want use performance monitoring counter interrupt and inter-processor interrupt (be provided by APIC) to send NMI instead of NMI watchdog.
The answer depends on your hardware.
Non-maskable interrupts (NMI) can be triggered 2 ways: 1) when the kernel reaches a halting state that can't be interrupted by another method, and 2) by hardware -- using an NMI button.
On the front of some Dell servers, for example, you will see a small circle with a zig-zag line inside it. This is the NMI symbol. Nearby there is a hole. Insert a pin to trigger the interrupt. If your kernel is built to support it, this will dump a kernel panic trace to the console, then reboot the system.
This can happen very fast. So if you don't have a console attached to save the output to a file, it might look like only a reboot.
As I know, nmi_watchdog would only triggered for non-interruptible hangs. I found an code example by google: http://oslearn.blogspot.in/2011/04/use-nmi-watchdog.html
If your deadlock is not non-interruptiable, you can try enable sysRq to trigger some trace (Alt-printscreen-t) or crash (Alt-printscreen-c) to get more information.

Linux Interrupt Handling in User Space

In Linux, what are the options for handling device interrupts in user space code rather than in kernel space?
Experience tells it is possible to write good and stable user-space drivers for almost any PCI adapter. It just requires some sophistication and a small proxying layer in the kernel. UIO is a step in that direction, but If you want to correctly handle interrupts in user-space then UIO might not be enough, for example if the device doesn't support the PCI-spec's interrupt disable bit which UIO relies on.
Notice that process wakeup latencies are a few microsecs so if your implementation requires very low latency then user-space might be a drag on it.
If I were to implement a user-space driver, I would reduce the kernel ISR to just a "disable & ack & wakeup-userpace" operation, handle the interrupt inside the waked-up process, and then re-enable the interrupt (of course, by writing to mapped PCI memory from the userspace process).
There is Userspace I/O system (UIO), but handling should still be done in kernelspace. OTOH, if you just need to notice the interrupt, you don't need the kernel part.
You may like to take a look at CHAPTER 10: Interrupt Handling from Linux Device Drivers, Third Edition book.
Have to trigger userland code indirectly.
Kernel ISR indicates interrupt by writing file / setting register / signalling. User space application polls this and goes on with the appropriate code.
Edge cases: more or less interrupts than expected (time out / too many interrupts per time interval)
Linux file abstraction is used to connect kernel and user space. This is performed by character devices and ioctl() calls. Some may prefer sysfs entries for this purpose.
This can look odd because event triggered device notifications (interrupts) are hooked with 'time triggered' polling, but it is actually asyncronous blocking (read/select). Anyway some questions are arising according to performance.
So interrupts cannot be directly handled outside the kernel.
E.g. shared memory can be in user space and with some I/O permission settings addresses can be mapped, so U-I/O works, but not for direct interrupt handling.
I have found only one 'minority report' in topic vfio (http://lxr.free-electrons.com/source/Documentation/vfio.txt):
https://stackoverflow.com/a/21197797/5349798
Similar questions:
Running user thread in context of an interrupt in linux
Is it possible in linux to register a interrupt handler from any user-space program?
Linux Kernel: invoke call back function in user space from kernel space
Linux Interrupt vs. Polling
Linux user space PCI driver
How do I inform a user space application that the driver has received an interrupt in linux?

Resources