Lowering linux kernel timer frequency - linux

When I run my Virtual Machine with Gentoo as guest, I have found that there is considerable overhead coming from tick_periodic function. (This is the function which runs on every timer interrupt.) This function updates a global jiffy using write_seqlocks which leads to the overhead.
Here's a grep of HZ and relevant stuff in my kernel config file.
sharan013#sitmac4:~$ cat /boot/config | egrep 'HZ|TIME'
# CONFIG_RCU_FAST_NO_HZ is not set
CONFIG_NO_HZ=y
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
# CONFIG_MACHZ_WDT is not set
CONFIG_TIMERFD=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_X86_CYCLONE_TIMER=y
CONFIG_HPET_TIMER=y
Clearly it has set the configuration to 1000, but when I do sysconf(_SC_CLK_TCK), I get 100 as my timer frequency. So what is my system's timer frequency?
What I want to do is to bring the frequency down to 100, even lower if possible. Although it might effect the interactivity and precision of poll/select and schedulers time slice, I am ready to sacrifice these things for lesser timer interrupt as it will speed up VM.
When I tried to find out what has to be done I read in some place that you can do so by changing in the configuration file, else where I read that adding divider=10 to the boot parameter does the job, else where I read that none of it is needed if you can set the CONFIG_HIGH_RES_TIMERS to acheive low-latency timers even without increasing the timer frequency and the same is possible with a tickless system CONFIG_NO_HZ.
I am extermely confused about what is the right approach.
All I want is to bring down the timer interrupt to as low as possible.
Can I know the right way of doing this?

Don't worry! Your confusion is nothing but expected. Linux timer interrupts are very confusing and have had a long and quite exciting history.
CLK_TCK
Linux has no sysconf system call and glibc is just returning the constant value 100. Sorry.
HZ <-- what you probably want
When configuring your kernel you can choose a timer frequency of either 100Hz, 250Hz, 300Hz or 1000Hz. All of these are supported, and although 1000Hz is the default it's not always the best.
People will generally choose a high value when they value latency (a desktop or a webserver) and a low value when they value throughput (HPC).
CONFIG_HIGH_RES_TIMERS
This has nothing to do with timer interrupts, it's just a mechanism that allows you to have higher resolution timers. This basically means that timeouts on calls like select can be more accurate than 1/HZ seconds.
divider
This command line option is a patch provided by Red Hat. You can probably use this (if you're using Red Hat or CentOS), but I'd be careful. It's caused lots of bugs and you should probably just recompile with a different Hz value.
CONFIG_NO_HZ
This really doesn't do much, it's for power saving and it means that the ticks will stop (or at least become less frequent) when nothing is executing. This is probably already enabled on your kernel. It doesn't make any difference when at least one task is runnable.
Frederic Weisbecker actually has a patch pending which generalizes this to cases where only a single task is running, but it's a little way off yet.

Related

How to implement sleep utility in RISC-V?

I want to implement sleep utility that receives number of seconds as an input and pauses for given seconds on a educatational xv6 operation system that runs on risc-v processors.
The OS already have system call that get number of ticks and pauses: https://github.com/mit-pdos/xv6-riscv/blob/riscv/kernel/sysproc.c#L56
Timers are initialized using a timer vector: https://github.com/mit-pdos/xv6-riscv/blob/riscv/kernel/kernelvec.S#L93
The timer vector is initialized with CLINT_MTIMECMP function that tells timer controller when to wake the next interrupt.
What I do not understand is how to know the time between the ticks and how many ticks are done during 1 second.
Edit: A quick google of "qemu timebase riscv mtime" found a google groups chat which states that RDTIME is nanoseconds since boot and mtime is an emulated 10Mhz clock.
I haven't done a search to find the information you need, but I think I have some contextual information that would help you find it. I would recommend searching QEMU documentation / code (probably from Github search)for how mtime and mtimecmp work.
In section 10.1 (Counter - Base Counter and Timers) of specification1, it is explained that the RDTIME psuedo-instruction should have some fixed tick rate that can be determined based on the implementation 2. That tick rate would also be shared for mtimecmp and mtime as defined in the privileged specification 3.
I would presume the ticks used be the sleep system call would be the same as these ticks from the specifications. In that case, xv6 is just a kernel and wouldn't then define how many ticks/second there are. It seems that xv6 is made to run on top of qemu so the definition of ticks/second should be defined somewhere in the qemu code and might be documented.
From the old wiki for QEMU-riscv it should be clear that the SiFive CLINT defines the features xv6 needs to work, but I doubt that it specifies how to know the tickrate. Spike also supports the CLINT interface so it may also be instructive to search for the code in spike that handles it.
1 I used version 20191213 of the unprivileged specification as a reference
2
The RDTIME pseudoinstruction reads the low XLEN bits of the time CSR, which counts wall-clock
real time that has passed from an arbitrary start time in the past. RDTIMEH is an RV32I-only in-
struction that reads bits 63–32 of the same real-time counter. The underlying 64-bit counter should
never overflow in practice. The execution environment should provide a means of determining the
period of the real-time counter (seconds/tick). The period must be constant. The real-time clocks
of all harts in a single user application should be synchronized to within one tick of the real-time
clock. The environment should provide a means to determine the accuracy of the clock.
3
3.1.10
Machine Timer Registers (mtime and mtimecmp)
Platforms provide a real-time counter, exposed as a memory-mapped machine-mode read-write
register, mtime. mtime must run at constant frequency, and the platform must provide a mechanism
for determining the timebase of mtime.

Is there a way to disable CPU cache (L1/L2) on a Linux system?

I am profiling some code on a Linux system (running on Intel Core i7 4500U) to obtain the time of ONLY the execution costs. The application is the demo mpeg2dec from libmpeg2. I am trying to obtain a probability distribution for the mpeg2 execution times. However we want to see the raw execution cost when cache is switched off.
Is there a way I can disable the cpu cache of my system via a Linux command, or via a gcc flag ? or even set the cpu (L1/L2) cache size to 0KB ? or even add some code changed to disable cache ? Of course, without modifying or rebuilding the kernel.
See this 2012 thread, someone posted a tiny kernel module source to disable cache through asm.
http://www.linuxquestions.org/questions/linux-kernel-70/disabling-cpu-caches-936077/
If disabling the cache is really necessary, then so be it.
Otherwise, to know how much time a process takes in terms of user or system "cycles", then I would recommend the getrusage() function.
struct rusage usage;
getrusage(RUSAGE_SELF, &usage);
You can call it before/after your loop/test and subtracted the values to get a good idea of how much time your process took, even if many other processes run in parallel on the same machine. The main problem you'd get is if your process start swapping. In that case your timings will be off.
double user_usage = usage.ru_utime.tv_sec + usage.ru_utime.tv_usec / 1000000.0;
double system_uage = usage.ru_stime.tv_sec + usage.ru_stime.tv_usec / 1000000.0;
This is really precise from my own experience. To increase precision, you could be root when running your test and give it a negative priority (-1 or -2 is enough.) Then it won't be swapped out until you call a function that may require it.
Of course, you still get the effect of the cache... assuming you do not handle very large amount of data with code that goes on and on (opposed to having a loop).

How to change the system clock rate or OS clock rate?

I want to know is there any way to change Windows or Linux clock rate or the system clock rate (maybe via BIOS)? I mean accelerate or decelerate system clock!
For example every 24 hours in the computer lasts 12 hours or 36 hours in real!!!
NOTE :
Using the below batch file, I can decelerate Windows time. But I want something in a lower level! I want to change clock pace in a way that all time for all the programs and tool be slower or faster! not only Windows time!
#echo off
:loop
set T=%time%
timeout 1
time %T%
timeout 1
goto loop
So your CPU's clock is not actually programmable via system calls. It's actually working off of an oscillator w/ crystal. You cannot change during booting up. This is done intentionally so that your CPU is able time regardless of your power/wifi/general system status.
As commented by That Other Guy you might perhaps use adjtimex(2) syscall, but you first should be sure that no NTP client daemon -which uses adjtimex- is running (so stop any ntpd or chrony service).
I'm not sure it would work, and it might make your system quite unstable.
A more rude possibility might be to forcibly set the date(1) -or also hwclock(8)- quite often (e.g. in some crontab job running every 5 minutes).
I believe it (i.e. decelerating a lot the system clock) is a strange and bad thing to do. Don't do that on a production machine (or even on some machine doing significant requests on the Web). Be prepared to perhaps break a lot of things.

Cyclictest for RT patched Linux Kernel

Hello I patched the Linux kernel with the RT-Patch and tested it with the Cyclinctest which monitors latencies. The Kernel isn't doing good and not better than the vanilla kernel.
https://rt.wiki.kernel.org/index.php/Cyclictest
I checked the uname for RT, which looks fine.
So I checked the requirements for the cyclinctest and it states that I have to make sure that the following is configured within the kernel config:
CONFIG_PREEMPT_RT=y
CONFIG_WAKEUP_TIMING=y
CONFIG_LATENCY_TRACE=y
CONFIG_CRITICAL_PREEMPT_TIMING=y
CONFIG_CRITICAL_IRQSOFF_TIMING=y
The Problem now arising is that the config doesn't contain such entries. Maybe there are old and the they may be renamed in the new patch versions (3.8.14)?
I found options like:
CONFIG_PREEMPT_RT_FULL=y
CONFIG_PREEMPT=y
CONFIG_PREEMPT_RT_BASE=y
CONFIG_HIGH_RES_TIMERS=y
Is that enought in the 3.x kernel to provide the required from above? Anyone a hint?
There's a lot that must be done to get hard realtime performance under PREEMPT_RT. Here are the things I am aware of. Entries marked with an asterisk apply to your current position.
Patch the kernel with PREEMPT_RT (as you already did), and enable CONFIG_PREEMPT_RT_FULL (which used to be called CONFIG_PREEMPT_RT, as you correctly derived).
Disable processor frequency scaling (either by removing it from the kernel configuration or by changing the governor or its settings). (*)
Reasoning: Changing a core's frequency takes a while, during which the core does no useful work. This causes high latencies.
To remove this, look under the ACPI options in the kernel settings.
If you don't want to remove this capability from the kernel, you can set the cpufreq governor to "performance" to lock it into its highest frequency.
Disable deep CPU sleep states
Reasoning: Like switching frequencies, Waking the CPU from a deep sleep can take a while.
Cyclictest does this for you (look up /dev/cpu_dma_latency to see how to do it in your application).
Alternatively, you can disable the "cpuidle" infrastructure in the kernel to prevent this from ever occurring.
Set a high priority for the realtime thread, above 50 (preferably 99) (*)
Reasoning: You need to place your priority above the majority of the kernel -- much of a PREEMPT_RT kernel (including IRQs) runs at a priority of 50.
For cyclictest, you can do this with the "-p#" option, e.g. "-p99".
Your application's memory must be locked. (*)
Reasoning: If your application's memory isn't locked, then the kernel may need to re-map some of your application's address space during execution, triggering high latencies.
For cyclictest, this may be done with the "-m" option.
To do this in your own application, see the RT_PREEMPT howto.
You must unload the nvidia, nouveau, and i915 modules if they are loaded (or not build them in the first place) (*)
Reasoning: These are known to cause high latencies. Hopefully you don't need them on a realtime system :P
Your realtime task must be coded to be realtime
For example, you cannot do file access or dynamic memory allocation via malloc(). Many system calls are off-limits (it's hard to find which ones are acceptable, IMO).
cyclictest is mostly already coded for realtime operation, as are many realtime audio applications. You do need to run it with the "-n" flag, however, or it will not use a realtime-safe sleep call.
The actual execution of cyclictest should have at least the following set of parameters:
sudo cyclictest -p99 -m -n

Change linux kernel timer

I have to run a latency sensitive application and I have been asked to change the timer resolution to 1000 Hz or more. I searched on the net a bit and found pages about CONFIG_HZ etc.
However, there are what seem to be several other related settings in the file as well, so I want to be sure that I don't mess the settings up. I am posting some output here
$cat /boot/config-2.6.28-11-generic | grep HZ
# CONFIG_HZ_1000 is not set
# CONFIG_HZ_300 is not set
CONFIG_MACHZ_WDT=m
CONFIG_NO_HZ=y
CONFIG_HZ=250
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
So does the second line, "# CONFIG_HZ_1000 is not set", mean that 1000Hz is not supported? Do I have to change just CONFIG_HZ and not CONFIG_HZ-250?
PS: I'm using the 2.6 kernel (ubuntu jaunty) on a Geode processor.
EDIT1: I ran some code from http://www.advenage.com/topics/linux-timer-interrupt-frequency.php on my desktop machine and the development machine. The code supposedly is an accurate measure of how fast a timer the system can sustain. The output was approximately 183 Hz (on the development machine). So will changing the timer be meaningful?
Don't edit .config directly, unless you're a Kbuild expert (and if you're asking this, you're not a Kbuild expert). Instead run make menuconfig or make xconfig to load the menu-based configuration system. Alternately, make config will do a line-based configuration process (where it asks you several hundred questions about what to configure - not recommended). The relevant option is under "Processor type and features" as "Timer frequency".
That said, this may not be necessary. Modern Linux can use high-resolution event timers (CONFIG_HIGH_RES_TIMERS) to acheive low-latency timers even without increasing the timer frequency. With a tickless system (CONFIG_NO_HZ) , the timer frequency has little effect at all.
On the other hand, I'm not sure what timer support Geode CPUs have. You may want to run cyclictest with various kernel configurations to see what you need to get low latency performance. The test you ran tests maximum dispatch frequency, not dispatch latency, so comparing with cyclictest results would be interesting. If you need really low latency, the CONFIG_PREEMPT_RT patchset may also be of interest.
To change the timer setting you need to recompile the kernel. Change the option in some standard menu configuration tool, rather than the text file.
/boot/config... files only tell you what is installed in the specific kernel binary. This is not a configuration file you can change.
does the second line, # CONFIG_HZ_1000 is not set, mean that 1000Hz is not supported?
When a config option is not available it's just not present in the .config file.
For instance, those kernel options:
# CONFIG_HZ_1000 is not set
# CONFIG_HZ_300 is not set
are available for you to set.
To set them, the safest is to use a menu based interface like make menuconfig.
In menuconfig, to find out the location of a given kernel config parameter, type / to open the search box.

Resources