For the several hours now, I was trying to find a way to measure time interval within assmebly code. What I have seen so far, is that I can query the number of CPU cycles, but of course, I'd need to know CPU frequency to translate number of cycles into time. I have found the rdmsr instruction, but it is ring0 instruction, and ring0 is not something I can to put my code in.
Some examples I've found call Windows Query* functions for this, but I am not running on Windows. Is there any way for me to measure time interval in user level? Any other way to get frequency, or may be other clock I can access directly? One-second resolution system clock is of course out of the question :)
I spent quite a while working with cycle counters, and eventually came to the (perhaps obvious) conclusion that RDTSC counts cycles, not time. It will never count time because the computer's clock is being constantly ramped up and down by the power management unit. So the cycle counter is extremely precise for measuring cycles, horribly off by random amounts in real time units. I believe Intel eventually addressed this by locking the cycle counter to a clock that is not affected by the PMU, but I haven't investigated it.
The Windows Query* functions do not actually use the RDTSC cycle counter. I thought they did until I tried to measure really small periods and found it had a 14MHz(?) tick, which turned out to be the PCI data bus clock.
On top of all this, each core has its own cycle counter. So have to pay attention to which core you are using when executing the RDTSC opcode. And each core has its own PMU.
The best timer you will find in Windows user mode is QueryPerformanceCounter() and QueryPerformanceFrequency().
Related
In order to calculate time we usually use a system call, but what if I wanted to implement it myself, is it possible? Using rdtsc gives me the amount of cpu clocks from the time we turned it on. It is still not possible to calculate the time since we need the cpu frequency for this purpose. The basic problem is that it looks to me that not only the cpu frequency that I measure is different from what I see in /proc/cpuinfo but it also can change over time (overclocking and underclocking). So how is the system call implemented?
The solution framework I'm looking for is to calculate the time by looking at some initial time t0, the amount of cpu clocks since t0 (using rdtsc) and the cpu frequency. Something like
t1 = t0 + (rdtsc1 - rdtsc0) / frequency
Most computers have a real-time clock separate from the CPU, e.g. in the "BIOS." The interrupt code for the real-time clock is 0x1A as you can see here: http://en.wikipedia.org/wiki/BIOS_interrupt_call
So if you were to implement something like time() you could do it simply by invoking the 0x1A interrupt to query the real-time clock. Of course this is somewhat platform-dependent.
I thinking about writing simple 8088 emulator. But I can't understand how to connect 8088 core with video subsystem.
I thinking about main loop:
while (TRUE)
{
execute_cpu_cycles_per_scanline() ;
paint_scanline() ;
}
Does this method is suitable for CPU and graphics emulation? Any other methods ? Any good explanation why I can't use different threads for CPU and Video. How dealing with this problem emulators like QEMU or others (x86).
Thanks.
well there are so many x86 processors and as they have evolved over time the instructions to clock periods have become somewhat non-deterministic. For older cpus like the 8088 and 6502, etc, if documented and accurate you could simply count the clock cycles for each instruction, and when the number of simulated clock cycles is equal to or greater than the scanline draw time or some interrupt interval or whatever then you could do what you are suggesting. and if you look at mame for example or other emulators that is basically how they do it, use the instructions clock cycles to determine time elapsed and from that manage emulated time in the peripherals.
lets say you want to run linux on qemu, you wouldnt want the emulated clock that tells time to be determined by the execution of instructions, you would want to sync that clock with the hardware system clock. Likewise you might want to sync the refresh rates based on the real hardware refresh rates rather than simulated ones.
so those are the two extremes. you will need to do one or the other or something in between.
Wall clock time is usually provided by the systems RTC. This mostly only provides times down to the millisecond range and typically has a granularity of 10-20 miliseconds. However the resolution/granularity of gettimeofday() is often reported to be in the few microseconds range. I assume the microsecond granularity must be taken from a different source.
How is the microsecond resolution/granularity of gettimeofday() accomplished?
When the part down to the millisecond is taken from the RTC and the mircoseconds are taken from a different hardware, a problem with phasing of the two sources arises. The two sources have to be synchronized somehow.
How is the synchronization/phasing between these two sources accomplished?
Edit: From what I've read in links provided by amdn, particulary the following Intel link, I would add a question here:
Does gettimeofday() provide resolution/granularity in the microsecond regime at all?
Edit 2: Summarizing the amdns answer with some more results of reading:
Linux only uses the realtime clock (RTC) at boot time
to synchronize with a higher resolution counter, i.g. the Timestampcounter (TSC). After the boot gettimeofday() returns a time which is entirely based on the TSC value and the frequency of this counter. The initial value for the TSC frequency is corrected/calibrated by means of comparing the system time to an external time source. The adjustment is done/configured by the adjtimex() function. The kernel operates a phase locked loop to ensure that the time results are monotonic and consistent.
This way it can be stated that gettimeofday() has microsecond resolution. Taking into account that more modern Timestampcounter are running in the GHz regime, the obtainable resolution could be in the nanosecond regime. Therefore this meaningfull comment
/**
407 * do_gettimeofday - Returns the time of day in a timeval
408 * #tv: pointer to the timeval to be set
409 *
410 * NOTE: Users should be converted to using getnstimeofday()
411 */
can be found in Linux/kernel/time/timekeeping.c. This suggest that there will possibly
be an even higher resolution function available at a later point in time. Right now getnstimeofday() is only available in kernel space.
However, looking through all the code involved to get this about right, shows quite a few comments about uncertainties. It may be possible to obtain microsecond resolution. The function gettimeofday() may even show a granularity in the microsecond regime. But: There are severe daubts about its accuracy because the drift of the TSC frequency cannot be accurately corrected for. Also the complexity of the code dealing with this matter inside Linux is a hint to believe that it's in fact too difficult to get it right. This is particulary but not solely caused by the huge number of hardware platforms Linux is supposed to run on.
Result: gettimeofday() returns monotonic time with microsecond granularity but the time it provides is almost never is phase to one microsecond with any other time source.
How is the microsecond resolution/granularity of gettimeofday() accomplished?
Linux runs on many different hardware platforms, so the specifics differ. On a modern x86 platform Linux uses the Time Stamp Counter, also known as the TSC, which is driven by multiple of a crystal oscillator running at 133.33 MHz. The crystal oscillator provides a reference clock to the processor, and the processor multiplies it by some multiple - for example on a 2.93 GHz processor the multiple is 22. The TSC historically was an unreliable source of time because implementations would stop the counter when the processor went to sleep, or because the multiple wasn't constant as the processor shifted multipliers to change performance states or throttle down when it got hot. Modern x86 processors provide a TSC that is constant, invariant, and non-stop. On these processors the TSC is an excellent high resolution clock and the Linux kernel determines an initial approximate frequency at boot time. The TSC provides microsecond resolution for the gettimeofday() system call and nanosecond resolution for the clock_gettime() system call.
How is this synchronization accomplished?
Your first question was about how the Linux clock provides high resolution, this second question is about synchronization, this is the distinction between precision and accuracy. Most systems have a clock that is backed up by battery to keep time of day when the system is powered down. As you might expect this clock doesn't have high accuracy or precision, but it will get the time of day "in the ballpark" when the system starts. To get accuracy most systems use an optional component to get time from an external source on the network. Two common ones are
Network Time Protocol
Precision Time Protocol
These protocols define a master clock on the network (or a tier of clocks sourced by an atomic clock) and then measure network latencies to estimate offsets from the master clock. Once the offset from the master is determined the system clock is disciplined to keep it accurate. This can be done by
Stepping the clock (a relatively large, abrupt, and infrequent time adjustment), or
Slewing the clock (defined as how much the clock frequency should be adjusted by either slowly increasing or decreasing the frequency over a given time period)
The kernel provides the adjtimex system call to allow clock disciplining. For details on how modern Intel multi-core processors keep the TSC synchronized between cores see CPU TSC fetch operation especially in multicore-multi-processor environment.
The relevant kernel source files for clock adjustments are kernel/time.c and kernel/time/timekeeping.c.
When Linux starts, it initializes the software clock using the hardware clock. See the chapter How Linux Keeps Track of Time in the Clock HOWTO.
Some of the things I want to measure are very short,and I can only repeat them so many times if I don't run any of the setup/dispose code in the middle.
note: on linux,reading /proc/stat
Not very portable and you'll have to take great care so it is reliable, but the Time Stamp Counter definitely has the highest resolution available (increases at every CPU tick).
The time stamp counter has, until
recently, been an excellent
high-resolution, low-overhead way of
getting CPU timing information. With
the advent of multi-core/hyperthreaded
CPUs, systems with multiple CPUs, and
"hibernating" operating systems, the
TSC cannot be relied on to provide
accurate results - unless great care
is taken to correct the possible
flaws: rate of tick and whether all
cores (processors) have identical
values in their time-keeping
registers. There is no promise that
the timestamp counters of multiple
CPUs on a single motherboard will be
synchronized. In such cases,
programmers can only get reliable
results by locking their code to a
single CPU. Even then, the CPU speed
may change due to power-saving
measures taken by the OS or BIOS, or
the system may be hibernated and later
resumed (resetting the time stamp
counter). In those latter cases, to
stay relevant, the counter must be
recalibrated periodically (according
to the time resolution your
application requires).
There's some notes there about Linux specific solutions on the page, too:
Under Linux, similar functionality is
provided by reading the value of
CLOCK_MONOTONIC clock using POSIX
clock_gettime function.
I am wondering if somebody knows some more details about the time stamp counter in Linux when a context switch occurs? Until now I had the opinion, that the TSC value is just increasing by 1 during each clock cycle, independent if in kernel or in user mode. I measured now the performance of an application using the TSC which yielded a performance result of 5 Mio Clock Cyles. Then, I made some changes to the scheduler which means that a context switch takes considerably longer, i.g. 2 Mio cycles instead of 500.000 cycles. The funny bit is, that when measuring the performance of the original application again it still takes 5 Mio cycles... So I am wondering why it did not take considerably longer as a context switch takes now almost 2 Mio clock cyles more? (and there occur at least 3 context during execution of the application).
Is the time stamp counter somehow deactivated during kernel mode? Or is the content of the TSC saved during contest switches? Thanks if someone could point me out what could be the problem!
As you can read on Wikipedia
With the advent of multi-core/hyperthreaded CPUs, systems with multiple CPUs, and "hibernating" operating systems, the TSC cannot be relied on to provide accurate results. The issue has two components: rate of tick and whether all cores (processors) have identical values in their time-keeping registers. There is no promise that the timestamp counters of multiple CPUs on a single motherboard will be synchronized. In such cases, programmers can only get reliable results by locking their code to a single CPU. Even then, the CPU speed may change due to power-saving measures taken by the OS or BIOS, or the system may be hibernated and later resumed (resetting the time stamp counter). Reliance on the time stamp counter also reduces portability, as other processors may not have a similar feature. Recent Intel processors include a constant rate TSC (identified by the constant_tsc flag in Linux's /proc/cpuinfo). With these processors the TSC reads at the processor's maximum rate regardless of the actual CPU running rate. While this makes time keeping more consistent, it can skew benchmarks, where a certain amount of spin-up time is spent at a lower clock rate before the OS switches the processor to the higher rate. This has the effect of making things seem like they require more processor cycles than they normally would.
I believe the TSC is actually a hardware construct of the processor you're using. IE: reading the TSC actually uses the RDTSC processor opcode. I don't even think there's a way for the OS to alter it's value, it just increases with each tick since the last power reset.
Regarding your modifications to the scheduler, is it possible that you're using a multi-core processor in a way that the OS is not switching out your running process? You might put a call to sched_yield() or sleep(0) in your program to see if your scheduler changes start taking effect.