How is the microsecond time of linux gettimeofday() obtained and what is its accuracy? - linux

Wall clock time is usually provided by the systems RTC. This mostly only provides times down to the millisecond range and typically has a granularity of 10-20 miliseconds. However the resolution/granularity of gettimeofday() is often reported to be in the few microseconds range. I assume the microsecond granularity must be taken from a different source.
How is the microsecond resolution/granularity of gettimeofday() accomplished?
When the part down to the millisecond is taken from the RTC and the mircoseconds are taken from a different hardware, a problem with phasing of the two sources arises. The two sources have to be synchronized somehow.
How is the synchronization/phasing between these two sources accomplished?
Edit: From what I've read in links provided by amdn, particulary the following Intel link, I would add a question here:
Does gettimeofday() provide resolution/granularity in the microsecond regime at all?
Edit 2: Summarizing the amdns answer with some more results of reading:
Linux only uses the realtime clock (RTC) at boot time
to synchronize with a higher resolution counter, i.g. the Timestampcounter (TSC). After the boot gettimeofday() returns a time which is entirely based on the TSC value and the frequency of this counter. The initial value for the TSC frequency is corrected/calibrated by means of comparing the system time to an external time source. The adjustment is done/configured by the adjtimex() function. The kernel operates a phase locked loop to ensure that the time results are monotonic and consistent.
This way it can be stated that gettimeofday() has microsecond resolution. Taking into account that more modern Timestampcounter are running in the GHz regime, the obtainable resolution could be in the nanosecond regime. Therefore this meaningfull comment
/**
407 * do_gettimeofday - Returns the time of day in a timeval
408 * #tv: pointer to the timeval to be set
409 *
410 * NOTE: Users should be converted to using getnstimeofday()
411 */
can be found in Linux/kernel/time/timekeeping.c. This suggest that there will possibly
be an even higher resolution function available at a later point in time. Right now getnstimeofday() is only available in kernel space.
However, looking through all the code involved to get this about right, shows quite a few comments about uncertainties. It may be possible to obtain microsecond resolution. The function gettimeofday() may even show a granularity in the microsecond regime. But: There are severe daubts about its accuracy because the drift of the TSC frequency cannot be accurately corrected for. Also the complexity of the code dealing with this matter inside Linux is a hint to believe that it's in fact too difficult to get it right. This is particulary but not solely caused by the huge number of hardware platforms Linux is supposed to run on.
Result: gettimeofday() returns monotonic time with microsecond granularity but the time it provides is almost never is phase to one microsecond with any other time source.

How is the microsecond resolution/granularity of gettimeofday() accomplished?
Linux runs on many different hardware platforms, so the specifics differ. On a modern x86 platform Linux uses the Time Stamp Counter, also known as the TSC, which is driven by multiple of a crystal oscillator running at 133.33 MHz. The crystal oscillator provides a reference clock to the processor, and the processor multiplies it by some multiple - for example on a 2.93 GHz processor the multiple is 22. The TSC historically was an unreliable source of time because implementations would stop the counter when the processor went to sleep, or because the multiple wasn't constant as the processor shifted multipliers to change performance states or throttle down when it got hot. Modern x86 processors provide a TSC that is constant, invariant, and non-stop. On these processors the TSC is an excellent high resolution clock and the Linux kernel determines an initial approximate frequency at boot time. The TSC provides microsecond resolution for the gettimeofday() system call and nanosecond resolution for the clock_gettime() system call.
How is this synchronization accomplished?
Your first question was about how the Linux clock provides high resolution, this second question is about synchronization, this is the distinction between precision and accuracy. Most systems have a clock that is backed up by battery to keep time of day when the system is powered down. As you might expect this clock doesn't have high accuracy or precision, but it will get the time of day "in the ballpark" when the system starts. To get accuracy most systems use an optional component to get time from an external source on the network. Two common ones are
Network Time Protocol
Precision Time Protocol
These protocols define a master clock on the network (or a tier of clocks sourced by an atomic clock) and then measure network latencies to estimate offsets from the master clock. Once the offset from the master is determined the system clock is disciplined to keep it accurate. This can be done by
Stepping the clock (a relatively large, abrupt, and infrequent time adjustment), or
Slewing the clock (defined as how much the clock frequency should be adjusted by either slowly increasing or decreasing the frequency over a given time period)
The kernel provides the adjtimex system call to allow clock disciplining. For details on how modern Intel multi-core processors keep the TSC synchronized between cores see CPU TSC fetch operation especially in multicore-multi-processor environment.
The relevant kernel source files for clock adjustments are kernel/time.c and kernel/time/timekeeping.c.

When Linux starts, it initializes the software clock using the hardware clock. See the chapter How Linux Keeps Track of Time in the Clock HOWTO.

Related

How to use jiffies in linux?

I understand what jiffies are and how to get the values in linux but I don't understand the purpose of it and how this value could be used ? Why do we even need it in the first place ? Could someone please explain to me ?
Thanks,
By and large, you don't need to use jiffies. They're an implementation detail of how the Linux kernel keeps track of time, and somewhat obsolete at that. Quoting man 7 time:
The software clock, HZ, and jiffies
The accuracy of various system calls that set timeouts, (e.g.,
select(2), sigtimedwait(2)) and measure CPU time (e.g.,
getrusage(2)) is limited by the resolution of the software clock, a
clock maintained by the kernel which measures time in jiffies. The
size of a jiffy is determined by the value of the kernel constant
HZ.
...
High-resolution timers
Since Linux 2.6.21, Linux supports high-resolution timers (HRTs),
optionally configurable via CONFIG_HIGH_RES_TIMERS. On a system
that supports HRTs, the accuracy of sleep and timer system calls
is no longer constrained by the jiffy, but instead can be as accurate
as the hardware allows (microsecond accuracy is typical of modern
hardware).
Instead of using jiffies, just use the higher-level calls like gettimeofday(2) which work in more standardized units like seconds.

How do I know if my CPU support high resolution timers?

As part of the linux kernel course we are explained that high resolution timers or may may not be supported by the hardware. The hardware that affects this support is only the CPU.
So I took my time and opened one of the intel CPUs specs
I am trying to understand If by reading the specs of a given CPU, I can determine if the OS can support high resolution timers.
In that specific manual I am uncertain what to look for, but my first guess is the "Clock Topology" section (2.6 in the link).
The section lists under it three types of HW clocks:
Base Clock reference clock (100MHz), PCIe reference clock and fixed clock 38.4MHz.
Now if the high high resolution clock support is based solely on the hardware, and not by some complex computation of multiple clocks and so forth, then the base reference clock's 100Hz is 10 nanoseconds, not 1. High resolution timers are supposedly support 1 nanoseconds resolutions.
I assume INTEL high-end CPU do support high resolution timers, but it seems I lack knowledge in how to read the manual and what is needed for that support.
Can someone elaborate more? does nanosecond resolution actually mean 1 nanosecond resolution? If this CPU does support HR-timers, what mechanisms are used to compensate the lack of HW support. Can this information be obtained from the OS itself?

Measuring time in assembly

For the several hours now, I was trying to find a way to measure time interval within assmebly code. What I have seen so far, is that I can query the number of CPU cycles, but of course, I'd need to know CPU frequency to translate number of cycles into time. I have found the rdmsr instruction, but it is ring0 instruction, and ring0 is not something I can to put my code in.
Some examples I've found call Windows Query* functions for this, but I am not running on Windows. Is there any way for me to measure time interval in user level? Any other way to get frequency, or may be other clock I can access directly? One-second resolution system clock is of course out of the question :)
I spent quite a while working with cycle counters, and eventually came to the (perhaps obvious) conclusion that RDTSC counts cycles, not time. It will never count time because the computer's clock is being constantly ramped up and down by the power management unit. So the cycle counter is extremely precise for measuring cycles, horribly off by random amounts in real time units. I believe Intel eventually addressed this by locking the cycle counter to a clock that is not affected by the PMU, but I haven't investigated it.
The Windows Query* functions do not actually use the RDTSC cycle counter. I thought they did until I tried to measure really small periods and found it had a 14MHz(?) tick, which turned out to be the PCI data bus clock.
On top of all this, each core has its own cycle counter. So have to pay attention to which core you are using when executing the RDTSC opcode. And each core has its own PMU.
The best timer you will find in Windows user mode is QueryPerformanceCounter() and QueryPerformanceFrequency().

What is the lowest time resolution for somewhat accurate measurements of cpu usage?

Some of the things I want to measure are very short,and I can only repeat them so many times if I don't run any of the setup/dispose code in the middle.
note: on linux,reading /proc/stat
Not very portable and you'll have to take great care so it is reliable, but the Time Stamp Counter definitely has the highest resolution available (increases at every CPU tick).
The time stamp counter has, until
recently, been an excellent
high-resolution, low-overhead way of
getting CPU timing information. With
the advent of multi-core/hyperthreaded
CPUs, systems with multiple CPUs, and
"hibernating" operating systems, the
TSC cannot be relied on to provide
accurate results - unless great care
is taken to correct the possible
flaws: rate of tick and whether all
cores (processors) have identical
values in their time-keeping
registers. There is no promise that
the timestamp counters of multiple
CPUs on a single motherboard will be
synchronized. In such cases,
programmers can only get reliable
results by locking their code to a
single CPU. Even then, the CPU speed
may change due to power-saving
measures taken by the OS or BIOS, or
the system may be hibernated and later
resumed (resetting the time stamp
counter). In those latter cases, to
stay relevant, the counter must be
recalibrated periodically (according
to the time resolution your
application requires).
There's some notes there about Linux specific solutions on the page, too:
Under Linux, similar functionality is
provided by reading the value of
CLOCK_MONOTONIC clock using POSIX
clock_gettime function.

Time Stamp counter (TSC) when switching between Kernel & User mode

I am wondering if somebody knows some more details about the time stamp counter in Linux when a context switch occurs? Until now I had the opinion, that the TSC value is just increasing by 1 during each clock cycle, independent if in kernel or in user mode. I measured now the performance of an application using the TSC which yielded a performance result of 5 Mio Clock Cyles. Then, I made some changes to the scheduler which means that a context switch takes considerably longer, i.g. 2 Mio cycles instead of 500.000 cycles. The funny bit is, that when measuring the performance of the original application again it still takes 5 Mio cycles... So I am wondering why it did not take considerably longer as a context switch takes now almost 2 Mio clock cyles more? (and there occur at least 3 context during execution of the application).
Is the time stamp counter somehow deactivated during kernel mode? Or is the content of the TSC saved during contest switches? Thanks if someone could point me out what could be the problem!
As you can read on Wikipedia
With the advent of multi-core/hyperthreaded CPUs, systems with multiple CPUs, and "hibernating" operating systems, the TSC cannot be relied on to provide accurate results. The issue has two components: rate of tick and whether all cores (processors) have identical values in their time-keeping registers. There is no promise that the timestamp counters of multiple CPUs on a single motherboard will be synchronized. In such cases, programmers can only get reliable results by locking their code to a single CPU. Even then, the CPU speed may change due to power-saving measures taken by the OS or BIOS, or the system may be hibernated and later resumed (resetting the time stamp counter). Reliance on the time stamp counter also reduces portability, as other processors may not have a similar feature. Recent Intel processors include a constant rate TSC (identified by the constant_tsc flag in Linux's /proc/cpuinfo). With these processors the TSC reads at the processor's maximum rate regardless of the actual CPU running rate. While this makes time keeping more consistent, it can skew benchmarks, where a certain amount of spin-up time is spent at a lower clock rate before the OS switches the processor to the higher rate. This has the effect of making things seem like they require more processor cycles than they normally would.
I believe the TSC is actually a hardware construct of the processor you're using. IE: reading the TSC actually uses the RDTSC processor opcode. I don't even think there's a way for the OS to alter it's value, it just increases with each tick since the last power reset.
Regarding your modifications to the scheduler, is it possible that you're using a multi-core processor in a way that the OS is not switching out your running process? You might put a call to sched_yield() or sleep(0) in your program to see if your scheduler changes start taking effect.

Resources