timing mechanisms in computer systems - linux

I've read this link on Measure time in Linux - getrusage vs clock_gettime vs clock vs gettimeofday? which provides a great breakdown of timing functions available in C
I'm very confused, however, to how the different notions of "time" are maintained by the OS/hardware.
This is a quote from the Linux man pages,
RTCs should not be confused with the system clock, which is a
software clock maintained by the kernel and used to implement
gettimeofday(2) and time(2), as well as setting timestamps on files,
and so on. The system clock reports seconds and microseconds since a
start point, defined to be the POSIX Epoch: 1970-01-01 00:00:00 +0000
(UTC). (One common implementation counts timer interrupts, once per
"jiffy", at a frequency of 100, 250, or 1000 Hz.) That is, it is
supposed to report wall clock time, which RTCs also do.
A key difference between an RTC and the system clock is that RTCs run
even when the system is in a low power state (including "off"), and
the system clock can't. Until it is initialized, the system clock
can only report time since system boot ... not since the POSIX Epoch.
So at boot time, and after resuming from a system low power state,
the system clock will often be set to the current wall clock time
using an RTC. Systems without an RTC need to set the system clock
using another clock, maybe across the network or by entering that
data manually.
The Arch Linux docs indicate that the RTC and system clock are independent after bootup. My questions then are:
What causes the interrupts that increments the system clock???
If wall time = time interval using the system clock, what does the process time depend on??
Is any of this all related to the CPU frequency? Or is that a totally orthogonal time-keeping business

On Linux, from the application point of view, the time(7) man page gives a good explanation.
Linux provides also the (linux specific) timerfd_create(2) and related syscalls.
You should not care about interrupts (they are the kernel's business, and are configured dynamically, e.g. thru application timers -timer_create(2), poll(2) and many other syscalls- and by the scheduler), but only about application visible time related syscalls.
Probably, if some process is making a timer with a tiny period of e.g. 10ms, the kernel will increase the frequency of timer interrupts to 100Hz
On recent kernels, you probably want the
CONFIG_HIGH_RES_TIMERS=y
CONFIG_TIMERFD=y
CONFIG_HPET_TIMER=y
CONFIG_PREEMPT=y
options in your kernel's .config file.
BTW, you could do cat /proc/interrupts twice with 10 seconds interval. On my laptop with a home-built 3.16 kernel -with mostly idle processes, but a firefox browser and an emacs, I'm getting 25 interrupts per second. Try also cat /proc/timer_list and cat /proc/timer_stats
Look also in the Documentation/timers/ directory of a recent (e.g. 3.16) Linux kernel tree.
The kernel probably use hardware devices like -for PC laptops and desktops- on-chip HPET (or the TSC) which are much better than the old battery saved RTC timer. Of course, details are hardware specific. So, on ARM based Linux systems (e.g. your Android smartphone) it is different.

Related

How to use jiffies in linux?

I understand what jiffies are and how to get the values in linux but I don't understand the purpose of it and how this value could be used ? Why do we even need it in the first place ? Could someone please explain to me ?
Thanks,
By and large, you don't need to use jiffies. They're an implementation detail of how the Linux kernel keeps track of time, and somewhat obsolete at that. Quoting man 7 time:
The software clock, HZ, and jiffies
The accuracy of various system calls that set timeouts, (e.g.,
select(2), sigtimedwait(2)) and measure CPU time (e.g.,
getrusage(2)) is limited by the resolution of the software clock, a
clock maintained by the kernel which measures time in jiffies. The
size of a jiffy is determined by the value of the kernel constant
HZ.
...
High-resolution timers
Since Linux 2.6.21, Linux supports high-resolution timers (HRTs),
optionally configurable via CONFIG_HIGH_RES_TIMERS. On a system
that supports HRTs, the accuracy of sleep and timer system calls
is no longer constrained by the jiffy, but instead can be as accurate
as the hardware allows (microsecond accuracy is typical of modern
hardware).
Instead of using jiffies, just use the higher-level calls like gettimeofday(2) which work in more standardized units like seconds.

How to measure total boottime for linux kernel on intel rangeley board

I am working on intel rangeley board. I want to measure the total time taken to boot the linux kernel. Is there any possible and proven way to achieve this on intel board?
Try using rdtsc. According to the Intel insn ref manual:
The processor monotonically increments the time-stamp counter MSR
every clock cycle and resets it to 0 whenever the processor is reset.
See “Time Stamp Counter” in Chapter 17 of the Intel® 64 and IA-32
Architectures Software Developer’s Manual, Volume 3B, for specific
details of the time stamp counter behavior.
(see the x86 tag wiki for links to manuals)
Normally the TSC is only used for relative measurements between two points in time, or as a timesource. The absolute value is apparently meaningful. It ticks at the CPU's rated clock speed, regardless of the power-saving clock speed it's actually running at.
You might need to make sure you read the TSC from the boot CPU on a multicore system. The other cores might not have started their TSCs until Linux sent them an inter-processor interrupt to start them up. Linux might sync their TSCs to the boot CPU's TSC, since gettimeofday() does use the TSC. IDK, I'm just writing down stuff I'd be sure to check on if I wanted to do this myself.
You may need to take precautions to avoid having the kernel modify the TSC when using it as a timesource. Probably via a boot option that forces Linux to use a different timesource.

Evaluating SMI (System Management Interrupt) latency on Linux-CentOS/Intel machine

I am interested in evaluating the behavior (latency, frequency) of SMI handling on Linux machine running CentOS and used for a (very) soft real time application.
What tools are recommended (hwlatdetect for CentOS?), and what is the best course of action to go about this?
If no good tools are available for CentOS, am I correct to assume that installing a
different OS on the same machine should yield the same results since the underlying hardware/bios are the same?
Is there any source for ballpark figures on these parameters.
The machines are X86_64 architecture, running CentOS 6.4 (kernel 2.6.32-358.23.2.el2.centos.plus.x86_64.)
SMIs can certainly happen during normal operation. My home desktop has a chipset-driven SMI every second and a half which is enabled in the chipset. I've also seen some servers that have them twice a second due to a BIOS-driven CPU frequency scaling scheme. However, some systems can go long periods of time without an SMI occurring so it really depends.
Question #1: hwlatdetect is one option to detect the latency of SMIs occurring on your system. BIOSBITS is another option which is a bootable CD that can identify if SMIs are occuring. You can also write your own test by creating a kernel module that spins in a loop and takes timestamps (using RDTSC). If you see a long gap between two timestamp readings, you could consult CPU MSR 0x34 to see if the SMI counter incremented which would indicate that an SMI happened.
If you want to generate an SMI, you can make a kernel module that does an OUT CPU instruction to port 0xb2, e.g. write a value of 0 to this port. (You can also time this SMI by gathering a timestamp just before and just after the write to port 0xB2).
Question #2, SMIs operate at a layer below the OS so which OS you choose, shouldn't have any impact.
Question #3: BIOSBITS recommends that SMI latencies be kept under 150 microseconds.
SMI will put your system into SMM (System Management Mode) mode, which will postpone the
normal execution of kernel during the SMI handling time period. In other words, SMM
is neither real mode nor protected mode as we know of normal operation of kernel,
instead it executes some special instruction kept in SMRAM (stored in Bios Firmware). To detect it's latency you can try to trigger an SMI (it can be software generated) and try to catch the total time spent in SMM mode. To accomplish this you can write a Linux kernel module, cause you'll be require some special privileges to issue an SMI (I think).
For real time systems I think it's nice if you can avoid these sort of interrupts like SMI.
You can check whether System Management Interrupts (SMI) are serviced or not with turbostat. For example:
# turbostat sleep 120
[check column SMI for value greater than 0]
Of course, from that you can also compute a SMI frequency.
Knowing that SMIs are actually happening at a certain rate is important information. But you also want to know how much time System Management Mode (SMM) spends in those interrupts. For example, if an SMI interruption is only very short than it might be irrelevant for your realtime application. On the other hand, if you have hardware with long SMI interruptions you probably want to talk to the vendor, configure the firmware differently (if possible) and or switch to other hardware with less intrusive SMM.
The perf tool has a mode that measures how many cycles are spend in SMM during SMIs (using the information provided by certain CPU counters). Example:
# perf stat -a -A --smi-cost -- sleep 120
Performance counter stats for 'system wide':
SMI cycles% SMI#
CPU0 0.0% 0
CPU1 0.0% 0
CPU2 0.0% 0
CPU3 0.0% 0
120.002927948 seconds time elapsed
You can also look at the raw values with:
# perf stat -a -A --smi-cost --metric-only -- sleep 120
From that you can compute how much time an SMI takes on average on your machine. (divide cycles difference by the number of cycles per time unit).
It certainly makes sense to cross check the CPU counter based results with empiric ones.
You can use the Linux Hardware Latency Detector that is integrated in the Linux Kernel. Usage example:
# echo hwlat > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /sys/kernel/debug/tracing/tracing_thresh
# watch -d -n 5 cat /sys/kernel/debug/tracing/tracing_max_latency
# echo "Don't forget to disable it again"
# echo nop > /sys/kernel/debug/tracing/current_tracer
Those tools are available on CentOS/RHEL 7 and should be available on other distributions, as well.
Regarding ballpark figures: Recently I came across a HP 2011-ish ProLiant Gen8 Xeon server that fires 504 SMIs per minute. Perf computes a rate of 0.1 % in SMM, and based on the counter values the averge time spent in an SMI is as high as several microseconds - but the Linux hwlat detector doesn't detect such high interruptions on that system.
That SMI rate matches what HP documents in its Configuring and tuning
HPE ProLiant Servers for low-latency applications guide (October, 2017):
Disabling System Management Interrupts to the processor provides one of
the greatest benefits to low-latency environments.
Disabling the Processor Power and Utilization Monitoring SMI has the greatest
effect because it generates a processor interrupt eight times a second in G6
and later servers.
(emphasis mine; and that guide also documents other SMI sources)
On a Supermicro board with Intel Atom C3758 and an Intel NUC (i5-4250U) system of mine there are exactly zero SMIs counted.
On an Intel i7-6600U based Dell laptop, the system reports 8 SMIs per minute, but the aperf counter is lower than the (unhalted) cycles counter which isn't supposed to happen.
Actually, SMI is used for more than just keyboard emulation. Servers use SMI to report and correct ECC memory errors, ACPI uses SMI to communicate with BIOS and perform some tasks, even enabling and disabling ACPI is done through SMI, BIOS often intercepts power state changes through SMI... there's more, this is just a few examples.
According to wikipage on System Management Mode, SMI is not used during normal operation, except perhaps to emulate a PS/2 keyboard with a USB physical keyboard.
And most Linux systems are able to drive genuine USB keyboard without that emulation. You could configure your BIOS to disable it.

How to get a millisecond precision uptime from user-space in Linux?

I'm working on a Raspberry Pi based project that has a GPS module which my boss wants me to get the time from for the system clock. However we also need to take readings on different sensors whilst the GPS may not have a fix, and we need to know to the millisecond precision (tolerance of 50-100ms is fine) when these readings were taken.
Personally I want a hardware RTC for this, but I've been instructed to work around it. My idea is to mark each reading with a relative time from system boot, the system time is not reliable, and is updated by NTP/Satellite time when available (I can then fix-up the records when a synchronized time is available using the relative time).
So, how can I get a millisecond precise uptime in Linux from user-space C code? Something like the jiffies value available in the kernel would be perfect.
I think you have to check the main controller(CPU) on your board. Usually, there will be a hardware timer module integrated into the CPU, or decrementer register implementation in the CPU core.
If there is a hardware timer or DEC register on your CPU, then use it to implement a periodical interrupt(the frequency can be 1000HZ or else). The interrupt handler can notify/wakeup the user-space process to do the necessary real-time work.

How is the microsecond time of linux gettimeofday() obtained and what is its accuracy?

Wall clock time is usually provided by the systems RTC. This mostly only provides times down to the millisecond range and typically has a granularity of 10-20 miliseconds. However the resolution/granularity of gettimeofday() is often reported to be in the few microseconds range. I assume the microsecond granularity must be taken from a different source.
How is the microsecond resolution/granularity of gettimeofday() accomplished?
When the part down to the millisecond is taken from the RTC and the mircoseconds are taken from a different hardware, a problem with phasing of the two sources arises. The two sources have to be synchronized somehow.
How is the synchronization/phasing between these two sources accomplished?
Edit: From what I've read in links provided by amdn, particulary the following Intel link, I would add a question here:
Does gettimeofday() provide resolution/granularity in the microsecond regime at all?
Edit 2: Summarizing the amdns answer with some more results of reading:
Linux only uses the realtime clock (RTC) at boot time
to synchronize with a higher resolution counter, i.g. the Timestampcounter (TSC). After the boot gettimeofday() returns a time which is entirely based on the TSC value and the frequency of this counter. The initial value for the TSC frequency is corrected/calibrated by means of comparing the system time to an external time source. The adjustment is done/configured by the adjtimex() function. The kernel operates a phase locked loop to ensure that the time results are monotonic and consistent.
This way it can be stated that gettimeofday() has microsecond resolution. Taking into account that more modern Timestampcounter are running in the GHz regime, the obtainable resolution could be in the nanosecond regime. Therefore this meaningfull comment
/**
407 * do_gettimeofday - Returns the time of day in a timeval
408 * #tv: pointer to the timeval to be set
409 *
410 * NOTE: Users should be converted to using getnstimeofday()
411 */
can be found in Linux/kernel/time/timekeeping.c. This suggest that there will possibly
be an even higher resolution function available at a later point in time. Right now getnstimeofday() is only available in kernel space.
However, looking through all the code involved to get this about right, shows quite a few comments about uncertainties. It may be possible to obtain microsecond resolution. The function gettimeofday() may even show a granularity in the microsecond regime. But: There are severe daubts about its accuracy because the drift of the TSC frequency cannot be accurately corrected for. Also the complexity of the code dealing with this matter inside Linux is a hint to believe that it's in fact too difficult to get it right. This is particulary but not solely caused by the huge number of hardware platforms Linux is supposed to run on.
Result: gettimeofday() returns monotonic time with microsecond granularity but the time it provides is almost never is phase to one microsecond with any other time source.
How is the microsecond resolution/granularity of gettimeofday() accomplished?
Linux runs on many different hardware platforms, so the specifics differ. On a modern x86 platform Linux uses the Time Stamp Counter, also known as the TSC, which is driven by multiple of a crystal oscillator running at 133.33 MHz. The crystal oscillator provides a reference clock to the processor, and the processor multiplies it by some multiple - for example on a 2.93 GHz processor the multiple is 22. The TSC historically was an unreliable source of time because implementations would stop the counter when the processor went to sleep, or because the multiple wasn't constant as the processor shifted multipliers to change performance states or throttle down when it got hot. Modern x86 processors provide a TSC that is constant, invariant, and non-stop. On these processors the TSC is an excellent high resolution clock and the Linux kernel determines an initial approximate frequency at boot time. The TSC provides microsecond resolution for the gettimeofday() system call and nanosecond resolution for the clock_gettime() system call.
How is this synchronization accomplished?
Your first question was about how the Linux clock provides high resolution, this second question is about synchronization, this is the distinction between precision and accuracy. Most systems have a clock that is backed up by battery to keep time of day when the system is powered down. As you might expect this clock doesn't have high accuracy or precision, but it will get the time of day "in the ballpark" when the system starts. To get accuracy most systems use an optional component to get time from an external source on the network. Two common ones are
Network Time Protocol
Precision Time Protocol
These protocols define a master clock on the network (or a tier of clocks sourced by an atomic clock) and then measure network latencies to estimate offsets from the master clock. Once the offset from the master is determined the system clock is disciplined to keep it accurate. This can be done by
Stepping the clock (a relatively large, abrupt, and infrequent time adjustment), or
Slewing the clock (defined as how much the clock frequency should be adjusted by either slowly increasing or decreasing the frequency over a given time period)
The kernel provides the adjtimex system call to allow clock disciplining. For details on how modern Intel multi-core processors keep the TSC synchronized between cores see CPU TSC fetch operation especially in multicore-multi-processor environment.
The relevant kernel source files for clock adjustments are kernel/time.c and kernel/time/timekeeping.c.
When Linux starts, it initializes the software clock using the hardware clock. See the chapter How Linux Keeps Track of Time in the Clock HOWTO.

Resources