Emulation of 8088 CPU with Video System - emulation

I thinking about writing simple 8088 emulator. But I can't understand how to connect 8088 core with video subsystem.
I thinking about main loop:
while (TRUE)
{
execute_cpu_cycles_per_scanline() ;
paint_scanline() ;
}
Does this method is suitable for CPU and graphics emulation? Any other methods ? Any good explanation why I can't use different threads for CPU and Video. How dealing with this problem emulators like QEMU or others (x86).
Thanks.

well there are so many x86 processors and as they have evolved over time the instructions to clock periods have become somewhat non-deterministic. For older cpus like the 8088 and 6502, etc, if documented and accurate you could simply count the clock cycles for each instruction, and when the number of simulated clock cycles is equal to or greater than the scanline draw time or some interrupt interval or whatever then you could do what you are suggesting. and if you look at mame for example or other emulators that is basically how they do it, use the instructions clock cycles to determine time elapsed and from that manage emulated time in the peripherals.
lets say you want to run linux on qemu, you wouldnt want the emulated clock that tells time to be determined by the execution of instructions, you would want to sync that clock with the hardware system clock. Likewise you might want to sync the refresh rates based on the real hardware refresh rates rather than simulated ones.
so those are the two extremes. you will need to do one or the other or something in between.

Related

What options do I have for running recurring events on a microsecond resolution from a kernel driver?

I want to create a simulation of an actual device on an x86 Linux Kernel. Part of this will involve simulating timings as close to possible as I can get. Based on some research it seems I will need at least microsecond resolution timing. I understand that on a non-realtime system it won't be possible to get perfect timing, but I don't perfect, just as close as I can get, perhaps with hacking around with thread scheduling / preemption options.
What I actually want to do is perform an action every interval, i.e. run a some code every Xµs. I've been trying to research the best ways to do this from a Kernel driver as well as some research into whether it's possible to do this reasonably accurately from user mode (keeping the above paragraph in mind). One of the first things that caught my eye was the HPET timer, that is programmable to generate interrupts based on programmable comparators. Unfortunately, it seems on many chipsets it has been rather buggy in the past, and there's not much information on using it for anything that obtaining a timestamp or using it as the main clock source. The linux Kernel provides an HPET driver that in the past, seemed to provide both kernel and user mode interfaces, but seems only to provide a barely documented usermode interface in more recent kernel versions. I've also read about various other kernel functions and interfaces such as the hrtimer interface and the various delay functions, though I'm having a bit of trouble understanding them and if they are suited for my purpose.
Given my current use case, what are the best options I have running recurring events at a µs resolution from say a kernel driver? Obviously accuracy is probably my biggest criteria, but ease of use would be second.
Well, it's possible to achieve your accuracy in userspace -- clock_nanosleep is one ideal option, which has relative and absolute mode. Since clock_nanosleep is based on hrtimer in kernel mode, you may want to use hrtimer if you'd like to implement it in kernel space.
However, to make the timer work accurately, there're two IMPORTENT things worth mentioning.
You should set the timerslack of your process (either by writing nonzero value in ns to /proc/self/timerslack_ns or via prctl(PR_SET_TIMERSLACK,...)). This value is considered as the 'tolerance' of the timer.
The CPU power management also matters here. The CPU has many different Cstates, each of which has a different exit latency. So you need to configure your cpuidle module to not use Cstates other than C0, e.g. for an Intel CPU you could simply write 1 to /sys/devices/system/cpu/cpu$c/cpuidle/state$s/disable to disable state $s of CPU $c. Or just add idle=poll to your kernel options to let CPU keep active (in C0) while kernel idle. NOTE that this significantly influences the power of the computer and leads the cooling fans to make noise.
You can get a timer with delays under 10 microseconds if the two things mentioned above are configured correctly. There is a trade-off between latency and power consumption that you should made.

Measuring time in assembly

For the several hours now, I was trying to find a way to measure time interval within assmebly code. What I have seen so far, is that I can query the number of CPU cycles, but of course, I'd need to know CPU frequency to translate number of cycles into time. I have found the rdmsr instruction, but it is ring0 instruction, and ring0 is not something I can to put my code in.
Some examples I've found call Windows Query* functions for this, but I am not running on Windows. Is there any way for me to measure time interval in user level? Any other way to get frequency, or may be other clock I can access directly? One-second resolution system clock is of course out of the question :)
I spent quite a while working with cycle counters, and eventually came to the (perhaps obvious) conclusion that RDTSC counts cycles, not time. It will never count time because the computer's clock is being constantly ramped up and down by the power management unit. So the cycle counter is extremely precise for measuring cycles, horribly off by random amounts in real time units. I believe Intel eventually addressed this by locking the cycle counter to a clock that is not affected by the PMU, but I haven't investigated it.
The Windows Query* functions do not actually use the RDTSC cycle counter. I thought they did until I tried to measure really small periods and found it had a 14MHz(?) tick, which turned out to be the PCI data bus clock.
On top of all this, each core has its own cycle counter. So have to pay attention to which core you are using when executing the RDTSC opcode. And each core has its own PMU.
The best timer you will find in Windows user mode is QueryPerformanceCounter() and QueryPerformanceFrequency().

How to get a millisecond precision uptime from user-space in Linux?

I'm working on a Raspberry Pi based project that has a GPS module which my boss wants me to get the time from for the system clock. However we also need to take readings on different sensors whilst the GPS may not have a fix, and we need to know to the millisecond precision (tolerance of 50-100ms is fine) when these readings were taken.
Personally I want a hardware RTC for this, but I've been instructed to work around it. My idea is to mark each reading with a relative time from system boot, the system time is not reliable, and is updated by NTP/Satellite time when available (I can then fix-up the records when a synchronized time is available using the relative time).
So, how can I get a millisecond precise uptime in Linux from user-space C code? Something like the jiffies value available in the kernel would be perfect.
I think you have to check the main controller(CPU) on your board. Usually, there will be a hardware timer module integrated into the CPU, or decrementer register implementation in the CPU core.
If there is a hardware timer or DEC register on your CPU, then use it to implement a periodical interrupt(the frequency can be 1000HZ or else). The interrupt handler can notify/wakeup the user-space process to do the necessary real-time work.

How is the microsecond time of linux gettimeofday() obtained and what is its accuracy?

Wall clock time is usually provided by the systems RTC. This mostly only provides times down to the millisecond range and typically has a granularity of 10-20 miliseconds. However the resolution/granularity of gettimeofday() is often reported to be in the few microseconds range. I assume the microsecond granularity must be taken from a different source.
How is the microsecond resolution/granularity of gettimeofday() accomplished?
When the part down to the millisecond is taken from the RTC and the mircoseconds are taken from a different hardware, a problem with phasing of the two sources arises. The two sources have to be synchronized somehow.
How is the synchronization/phasing between these two sources accomplished?
Edit: From what I've read in links provided by amdn, particulary the following Intel link, I would add a question here:
Does gettimeofday() provide resolution/granularity in the microsecond regime at all?
Edit 2: Summarizing the amdns answer with some more results of reading:
Linux only uses the realtime clock (RTC) at boot time
to synchronize with a higher resolution counter, i.g. the Timestampcounter (TSC). After the boot gettimeofday() returns a time which is entirely based on the TSC value and the frequency of this counter. The initial value for the TSC frequency is corrected/calibrated by means of comparing the system time to an external time source. The adjustment is done/configured by the adjtimex() function. The kernel operates a phase locked loop to ensure that the time results are monotonic and consistent.
This way it can be stated that gettimeofday() has microsecond resolution. Taking into account that more modern Timestampcounter are running in the GHz regime, the obtainable resolution could be in the nanosecond regime. Therefore this meaningfull comment
/**
407 * do_gettimeofday - Returns the time of day in a timeval
408 * #tv: pointer to the timeval to be set
409 *
410 * NOTE: Users should be converted to using getnstimeofday()
411 */
can be found in Linux/kernel/time/timekeeping.c. This suggest that there will possibly
be an even higher resolution function available at a later point in time. Right now getnstimeofday() is only available in kernel space.
However, looking through all the code involved to get this about right, shows quite a few comments about uncertainties. It may be possible to obtain microsecond resolution. The function gettimeofday() may even show a granularity in the microsecond regime. But: There are severe daubts about its accuracy because the drift of the TSC frequency cannot be accurately corrected for. Also the complexity of the code dealing with this matter inside Linux is a hint to believe that it's in fact too difficult to get it right. This is particulary but not solely caused by the huge number of hardware platforms Linux is supposed to run on.
Result: gettimeofday() returns monotonic time with microsecond granularity but the time it provides is almost never is phase to one microsecond with any other time source.
How is the microsecond resolution/granularity of gettimeofday() accomplished?
Linux runs on many different hardware platforms, so the specifics differ. On a modern x86 platform Linux uses the Time Stamp Counter, also known as the TSC, which is driven by multiple of a crystal oscillator running at 133.33 MHz. The crystal oscillator provides a reference clock to the processor, and the processor multiplies it by some multiple - for example on a 2.93 GHz processor the multiple is 22. The TSC historically was an unreliable source of time because implementations would stop the counter when the processor went to sleep, or because the multiple wasn't constant as the processor shifted multipliers to change performance states or throttle down when it got hot. Modern x86 processors provide a TSC that is constant, invariant, and non-stop. On these processors the TSC is an excellent high resolution clock and the Linux kernel determines an initial approximate frequency at boot time. The TSC provides microsecond resolution for the gettimeofday() system call and nanosecond resolution for the clock_gettime() system call.
How is this synchronization accomplished?
Your first question was about how the Linux clock provides high resolution, this second question is about synchronization, this is the distinction between precision and accuracy. Most systems have a clock that is backed up by battery to keep time of day when the system is powered down. As you might expect this clock doesn't have high accuracy or precision, but it will get the time of day "in the ballpark" when the system starts. To get accuracy most systems use an optional component to get time from an external source on the network. Two common ones are
Network Time Protocol
Precision Time Protocol
These protocols define a master clock on the network (or a tier of clocks sourced by an atomic clock) and then measure network latencies to estimate offsets from the master clock. Once the offset from the master is determined the system clock is disciplined to keep it accurate. This can be done by
Stepping the clock (a relatively large, abrupt, and infrequent time adjustment), or
Slewing the clock (defined as how much the clock frequency should be adjusted by either slowly increasing or decreasing the frequency over a given time period)
The kernel provides the adjtimex system call to allow clock disciplining. For details on how modern Intel multi-core processors keep the TSC synchronized between cores see CPU TSC fetch operation especially in multicore-multi-processor environment.
The relevant kernel source files for clock adjustments are kernel/time.c and kernel/time/timekeeping.c.
When Linux starts, it initializes the software clock using the hardware clock. See the chapter How Linux Keeps Track of Time in the Clock HOWTO.

Time Stamp counter (TSC) when switching between Kernel & User mode

I am wondering if somebody knows some more details about the time stamp counter in Linux when a context switch occurs? Until now I had the opinion, that the TSC value is just increasing by 1 during each clock cycle, independent if in kernel or in user mode. I measured now the performance of an application using the TSC which yielded a performance result of 5 Mio Clock Cyles. Then, I made some changes to the scheduler which means that a context switch takes considerably longer, i.g. 2 Mio cycles instead of 500.000 cycles. The funny bit is, that when measuring the performance of the original application again it still takes 5 Mio cycles... So I am wondering why it did not take considerably longer as a context switch takes now almost 2 Mio clock cyles more? (and there occur at least 3 context during execution of the application).
Is the time stamp counter somehow deactivated during kernel mode? Or is the content of the TSC saved during contest switches? Thanks if someone could point me out what could be the problem!
As you can read on Wikipedia
With the advent of multi-core/hyperthreaded CPUs, systems with multiple CPUs, and "hibernating" operating systems, the TSC cannot be relied on to provide accurate results. The issue has two components: rate of tick and whether all cores (processors) have identical values in their time-keeping registers. There is no promise that the timestamp counters of multiple CPUs on a single motherboard will be synchronized. In such cases, programmers can only get reliable results by locking their code to a single CPU. Even then, the CPU speed may change due to power-saving measures taken by the OS or BIOS, or the system may be hibernated and later resumed (resetting the time stamp counter). Reliance on the time stamp counter also reduces portability, as other processors may not have a similar feature. Recent Intel processors include a constant rate TSC (identified by the constant_tsc flag in Linux's /proc/cpuinfo). With these processors the TSC reads at the processor's maximum rate regardless of the actual CPU running rate. While this makes time keeping more consistent, it can skew benchmarks, where a certain amount of spin-up time is spent at a lower clock rate before the OS switches the processor to the higher rate. This has the effect of making things seem like they require more processor cycles than they normally would.
I believe the TSC is actually a hardware construct of the processor you're using. IE: reading the TSC actually uses the RDTSC processor opcode. I don't even think there's a way for the OS to alter it's value, it just increases with each tick since the last power reset.
Regarding your modifications to the scheduler, is it possible that you're using a multi-core processor in a way that the OS is not switching out your running process? You might put a call to sched_yield() or sleep(0) in your program to see if your scheduler changes start taking effect.

Resources