Linux kernel has better response time under stress - linux

I have a stange behavior that I fail to understand:
For performance measurement purpose, I'm using the 'old' parrallel port interface to generate IRQs on a debian kernel 3.2.0-4-amd64 (I am using an externel signal generator connected to tha ACK pin).
I wrote my own kernel module (top half only) to handle the interrupt and send an external signal back to the parrallel port and display both signals on a oscilloscope so I can measure the kernel response time.
Everything works as expected and I can see an average 70 µs of time response with some 'burst' of 20µs . I'm running on a "Intel(R) Core(TM) i3-3240 CPU # 3.40GHz".
Now, the "unexplained" part.
If I load the CPU, memory and I/O using the "stress" program, I expected the average time to be worst , but the opposit happens: my average response time drops to 20µs.
I tried on 3 differents kernel:
vanilla, PREEMT-RT anf vanilla with NO_HZ option set to false.
Can someone explain the magic of this ?
I change the 'governor' configuration to 'performance' but doesn't change anything.

Your interrupt handler has a higher priority than the stress program.
So the only influence the stress program has is to prevent the CPU from sleeping, which avoids the delay needed by the CPU to wake up from the sleeping state when an interrupt arrives.

Related

X86 clear interrupt flag instruction `cli` not working in user space?

I try to stop interrupts from user space for a specific isolated core,
so I set CPU affinity:
cpu_set_t set;
CPU_ZERO(&set);
CPU_SET(2, &set);
assert(sched_setaffinity(getpid(),sizeof(set),&set)==0);
and useiopl(3) to execute privileged instruction cli/sti in user space:
iopl(3);
__asm__("cli;");
// busy looping for a while
__asm__("sti;");
and there are two phenomenons I can't explain:
1 cli can't actually stop interrupts (at least not all interrupts), and interrupt, such as LOC (Local Timer Interrupt) comes out every now and then;
I notice lasted kernel patches prevent cli in user space (reference) , but this result can be reproduced in kernel 4.19.0.
2 AFAIK, cli only clear interrupt flag of CPU on which the program is running, but in practice, my whole system is stuck, not responding to my mouse or keyboard.
(2): Many parts of the Linux kernel depend on communicating with other cores, including RCU depending on for each core: run_on(core) and stuff like that. (https://lwn.net/Articles/262464/). Any kernel code doing that will get stuck when this core doesn't respond to the IPI that other cores send to ask the kernel on this core to switch to a certain task, or perhaps to do TLB shootdowns.
I don't know what exact thing would tend to lead to getting stuck, but I don't find it surprising at all that other parts of the kernel are waiting for something that depends on hearing back from the kernel on this core, and that blocks progress of something involved in getting keyboard/mouse events to an X server and to user-space. (Or even to a text console? That might have more hope, fewer layers of software.)
Or it's always possible that some keyboard or mouse interrupts get distributed to this core, and ignored.
As for (1): do you leave the NMI watchdog enabled, or other source of NMIs? That could get the kernel running temporarily in a state where (other?) interrupts are enabled.
I use kernel/nmi_watchdog = 0 in /etc/sysctl.d/99-local.conf to free up an extra perf counter, but the default is enabled.
(cli doesn't stop Non-Maskable Interrupts, as you might guess from the name.)
Other than that guess, I don't know why you'd still be occasional LOCal timer interrupts; maybe someone more familiar with modern x86 interrupts would know.

ksoftirqd behaviour on high network traffic

I am running Windriver Linux on a MIPS (octeon) based hardware.
Linux runs on 16 cores and we have koftirqd/0 to ksoftirq/15 running.
I observe the following behavior of load balancing on high incoming traffic ( like ping flood):
First, kostfirqd/0 takes all the load until it reaches some where around 96-97% of cpu.
Once cpu0 reaches 96-97% of usage, koftirqd/1 starts taking load and % of CPU for cpu1 starts increasing.
On more traffic being pumped in, cpu 1 reaches 96 -97% and cpu2 starts taking load. And it goes on till ksoftirqd/15 takes 96-97% as the incoming traffic increases.
Is this an expected behaviour?
Could you please let me know whether it is the default linux behavior or a possible improvement done by Windriver.
Thanks a lot,
Vasudev
Cavium Mips ethernet driver has the logic to send inter processor interrupt to other cores to take the load given the conditions.
When ever backlog crosses certain limit, then IPI is sent to other cores. And the handler for the IPI in turn is nothing but the NAPI poll logic.
Hence the behavior.

Linux; Kernel interrupts impacts on CPU-load

I have implementet Derek Molloys Loadable Kernel Module (see listing 4). It uses a Kernel Module to register a kernel interrupt on a GPIO rising edge. So every time there is a rising edge on a certain GPIO-pin, an interrupt (ISR) runs. The only thing happening in the interrupt, is counting up an integer. I'm running debian on the beaglebone (Linux beaglebone 3.8.13-bone47).
I put a square wave signal onto the GPIO, causing the interrupts to trigger with a certain frequency. If I turn the frequency up to somewhere above 10kHz, the processor freezes. I don't expect the the processor to be able to follow up to this pace, but I expect the load to be visible by the "top" command. Here is what I see:
This measurement is taken with 10 kHz kernel interrupts running, but i still only get:
%Cpu(s): 0.0 hi
"hi" is defined as: "time spent servicing hardware interrupts" in man top
How can that be? How can i measure the impact the kernel interrupt has on the CPU's idletime?

How to find the cause of delay?

A program I'm working on needs to process certain objects upon arrival from network in real-time. The throughput is good, but I have occasional drops in the input queue due to unexpected delays.
My analysis shows that most probably the source of the delay is outside my program; something like another process being scheduled on my process's CPU core (I set the affinity of the process to a certain core) or a hardware interrupt arriving (perhaps a network interrupt).
My problem is I don't know the source of the delay for sure. Is there a tool or a method to find how a CPU core was used exactly during a certain period of time? (Like for example telling me that core 0 was used by process 19494 99.1 percent of the time, process 20001 0.8 percent of the time and process 8110 0.1 percent of the time.)
I use Ubuntu 14.04 Server Edition on an HP server with a Xeon CPU.
could be CPU, diskspeed, networkspeed or memory.
Memory usage and CPU is easy to spot using htop . (use the sort option, F6)
HD speed could be an issue. for example if you use low-energy disks (they slow down when not in use). Do you have a database running on the same system?
use iotop , it might give a clue.

Evaluating SMI (System Management Interrupt) latency on Linux-CentOS/Intel machine

I am interested in evaluating the behavior (latency, frequency) of SMI handling on Linux machine running CentOS and used for a (very) soft real time application.
What tools are recommended (hwlatdetect for CentOS?), and what is the best course of action to go about this?
If no good tools are available for CentOS, am I correct to assume that installing a
different OS on the same machine should yield the same results since the underlying hardware/bios are the same?
Is there any source for ballpark figures on these parameters.
The machines are X86_64 architecture, running CentOS 6.4 (kernel 2.6.32-358.23.2.el2.centos.plus.x86_64.)
SMIs can certainly happen during normal operation. My home desktop has a chipset-driven SMI every second and a half which is enabled in the chipset. I've also seen some servers that have them twice a second due to a BIOS-driven CPU frequency scaling scheme. However, some systems can go long periods of time without an SMI occurring so it really depends.
Question #1: hwlatdetect is one option to detect the latency of SMIs occurring on your system. BIOSBITS is another option which is a bootable CD that can identify if SMIs are occuring. You can also write your own test by creating a kernel module that spins in a loop and takes timestamps (using RDTSC). If you see a long gap between two timestamp readings, you could consult CPU MSR 0x34 to see if the SMI counter incremented which would indicate that an SMI happened.
If you want to generate an SMI, you can make a kernel module that does an OUT CPU instruction to port 0xb2, e.g. write a value of 0 to this port. (You can also time this SMI by gathering a timestamp just before and just after the write to port 0xB2).
Question #2, SMIs operate at a layer below the OS so which OS you choose, shouldn't have any impact.
Question #3: BIOSBITS recommends that SMI latencies be kept under 150 microseconds.
SMI will put your system into SMM (System Management Mode) mode, which will postpone the
normal execution of kernel during the SMI handling time period. In other words, SMM
is neither real mode nor protected mode as we know of normal operation of kernel,
instead it executes some special instruction kept in SMRAM (stored in Bios Firmware). To detect it's latency you can try to trigger an SMI (it can be software generated) and try to catch the total time spent in SMM mode. To accomplish this you can write a Linux kernel module, cause you'll be require some special privileges to issue an SMI (I think).
For real time systems I think it's nice if you can avoid these sort of interrupts like SMI.
You can check whether System Management Interrupts (SMI) are serviced or not with turbostat. For example:
# turbostat sleep 120
[check column SMI for value greater than 0]
Of course, from that you can also compute a SMI frequency.
Knowing that SMIs are actually happening at a certain rate is important information. But you also want to know how much time System Management Mode (SMM) spends in those interrupts. For example, if an SMI interruption is only very short than it might be irrelevant for your realtime application. On the other hand, if you have hardware with long SMI interruptions you probably want to talk to the vendor, configure the firmware differently (if possible) and or switch to other hardware with less intrusive SMM.
The perf tool has a mode that measures how many cycles are spend in SMM during SMIs (using the information provided by certain CPU counters). Example:
# perf stat -a -A --smi-cost -- sleep 120
Performance counter stats for 'system wide':
SMI cycles% SMI#
CPU0 0.0% 0
CPU1 0.0% 0
CPU2 0.0% 0
CPU3 0.0% 0
120.002927948 seconds time elapsed
You can also look at the raw values with:
# perf stat -a -A --smi-cost --metric-only -- sleep 120
From that you can compute how much time an SMI takes on average on your machine. (divide cycles difference by the number of cycles per time unit).
It certainly makes sense to cross check the CPU counter based results with empiric ones.
You can use the Linux Hardware Latency Detector that is integrated in the Linux Kernel. Usage example:
# echo hwlat > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /sys/kernel/debug/tracing/tracing_thresh
# watch -d -n 5 cat /sys/kernel/debug/tracing/tracing_max_latency
# echo "Don't forget to disable it again"
# echo nop > /sys/kernel/debug/tracing/current_tracer
Those tools are available on CentOS/RHEL 7 and should be available on other distributions, as well.
Regarding ballpark figures: Recently I came across a HP 2011-ish ProLiant Gen8 Xeon server that fires 504 SMIs per minute. Perf computes a rate of 0.1 % in SMM, and based on the counter values the averge time spent in an SMI is as high as several microseconds - but the Linux hwlat detector doesn't detect such high interruptions on that system.
That SMI rate matches what HP documents in its Configuring and tuning
HPE ProLiant Servers for low-latency applications guide (October, 2017):
Disabling System Management Interrupts to the processor provides one of
the greatest benefits to low-latency environments.
Disabling the Processor Power and Utilization Monitoring SMI has the greatest
effect because it generates a processor interrupt eight times a second in G6
and later servers.
(emphasis mine; and that guide also documents other SMI sources)
On a Supermicro board with Intel Atom C3758 and an Intel NUC (i5-4250U) system of mine there are exactly zero SMIs counted.
On an Intel i7-6600U based Dell laptop, the system reports 8 SMIs per minute, but the aperf counter is lower than the (unhalted) cycles counter which isn't supposed to happen.
Actually, SMI is used for more than just keyboard emulation. Servers use SMI to report and correct ECC memory errors, ACPI uses SMI to communicate with BIOS and perform some tasks, even enabling and disabling ACPI is done through SMI, BIOS often intercepts power state changes through SMI... there's more, this is just a few examples.
According to wikipage on System Management Mode, SMI is not used during normal operation, except perhaps to emulate a PS/2 keyboard with a USB physical keyboard.
And most Linux systems are able to drive genuine USB keyboard without that emulation. You could configure your BIOS to disable it.

Resources