How does the Computer understand time? There's no asm instruction for waiting - programming-languages

I have been learning about programming languages and there is one question which bothers me all the time.
For example let's say that I programmed something which allows me to push a button every 5 seconds.
How does the Computer understand the waiting part(allows to push the button - waits 5 seconds and allows again)?
I already know that first higher programming languages are getting compiled into machine code so that the computer can run it. But if we take assembler for instance, which is very near to machine code, just human readble, there is no instruction for waiting.
The example which I have given with the waiting is just one example, there are much more things which I do not understand how the computer understands ;)

Cpu has a quartz timer crystal inside called cpu clock. When a current pass through it, it gives a presice frequency for that current.The Cpu can then use that frequency to keep the track of time.
So computer can understand “do something, wait for 5 seconds and then continue again”
for more info on quartz timer: https://en.m.wikipedia.org/wiki/Crystal_oscillator

For short delays on simple CPUs (like microcontrollers) with a known fixed clock frequency, and no multitasking, and a simple one instruction per clock cycle design, you can wait in asm with a "delay loop". Here's the arduino source (for AVR microcontrollers) for an implementation:
https://github.com/arduino/ArduinoCore-avr/blob/master/cores/arduino/wiring.c#L120
As you can see, the behavior depends on the clock-frequency of the CPU. You wouldn't normally loop for 5 seconds, though (that's a long time to burn power). Computers normally have timer and clock chips that can be programmed to raise an interrupt at a specific time, so you can put the CPU to sleep and have it woken up on the next interrupt if there's nothing else to do. Delay loops are good (on microcontrollers) for very short delays, too short to sleep for or even to program a timer for.
You might want to get a little microcontroller-board (not necessarily arduino) to play around with. There you have way less "bloat" from an operating system or libraries and you're much closer to the hardware.

Related

[zephyr-rtos][riot-os] Zephyr vs. RIOT OS

Hello everyone,
I'm Luiz Villa a researcher on software defined power electronics at the University of Toulouse. My team is working on trying to embed an RTOS onto a micro-controller in order to create a more friendly development process of embedded control in power electronics.
We are trying as much as possible to avoid using ISRs for two reasons:
It makes it easier to collaborate in software development (our project is open-source)
Interrupts make the code execution time non-deterministic (which we wish to avoid)
We would like to make a benchmark between Zephyr and RIOT-OS in terms of thread speed. We need a code that runs at 20kHz with two to three threads doing :
ADC acquisition and data averaging
Mathematical calculations for control (using CMSIS)
Communication with the outside
Since time is such a critical element for us, we need to know:
What is the minimum time for executing a thread in Zephyr and RIOT-OS?
The time required to switch between threads in Zephyr and RIOT-OS?
Our preliminary results show that:
When testing with a single thread and a sleep time of 0us, Zephyr has a period of 9us and riot 5us
When testing with a single thread and a sleep time of 10us, Zephyr has a period of 39us and riot 15us
We use a Nucleo-G474RE with the following code: https://gitlab.laas.fr/owntech/zephyr/-/tree/test_adc_g4
We are quite surprised with our results, since we expected both OS to consume much less resources than they do.
What do you think? Have you tried running any of these OS as fast as possible? What were your results? Have you tested Zephyr's thread switching time?
Thanks for reading
Luiz
Disclaimer: I'm a RIOT core developer.
The time required to switch between threads in Zephyr and RIOT-OS?
When testing with a single thread and a sleep time of 0us, Zephyr has a period of 9us and riot 5us
This seems about right.
If I run one of RIOT's own scheduling microbenchmarks (e.g., tests/bench_mutex_pingpong), on the nucleo-f401re (a 84MHz STM32F4 / Cortex-M4),
this is the result:
main(): This is RIOT! (Version: 2021.04-devel-1250-gc8cb79c)
main starting
{ "result" : 157303, "ticks" : 534 }
The tests measures how many times a thread switches to another thread and back.
One iteration (two context switches) take ~534 clock cycles, or 1000000/154303 = ~6.36us, which is close to the number you got.
This is the context switch overhead. A thread's registers and state are stored on its stack, the scheduler runs to figure out the next runnable thread, and restores that thread's registers and state.
I'm surprised Zephyr isn't closer to RIOT. Maybe check if it was compiled with optimization enabled, or if some enabled features increase the switching overhead (e.g., is the MPU enabled?).
What is the minimum time for executing a thread in Zephyr and RIOT-OS?
Whatever is left after ISR's have been served and contexts have been switched.
What do you think?
Having three threads scheduled at 20KHz and having them do actual work is gonna be tight with an Cortex-M on either Zephyr or RIOT, so I think you should re-architecture your application.
Having multiple threads for separating them logically is very nice, but here a classical main loop might be a better choice.
Something like this (pseudocode):
void loop() {
while(1) {
handle_adc();
do_dsp_computation();
send_data();
periodic_sleep_us(50);
}
}

Linux' hrtimer - microsecond precision?

Is it possible to execute tasks on a Linux host with microsecond precision? I.e., I'd like to execute a task at a specific instant of time. I know, Linux is no real-time system but I'm searching for the best solution on Linux.
So far, I've created a kernel module, setup hrtimer and measured the jitter when the callback function is entered (I don't really care too much about the actual delay, it's jitter that counts) - it's about 20-50us. That's not significantly better than using timerfd in userspace (also tried using real-time priority for the process but that did not really change anything).
I'm running Linux 3.5.0 (just an example, tried different kernels from 2.6.35 to 3.7), /proc/timer_list shows hrtimer_interrupt, I'm not running in failsafe mode which disables hrtimer functionality. Tried on different CPUs (Intel Atom to Core i7).
My best idea so far would be using hrtimer in combination with ndelay/udelay. Is this really the best way to do it? I can't believe it's not possible to trigger a task with microsecond precision. Running the code in kernel space as module is acceptable, would be great if the code was not interrupted by other tasks though. I dont' really care too much about the rest of the system, the task will be executed only very few times a second so using mdelay/ndelay for burning the CPU for some microseconds every time the task should be executed would not really matter. Altough, I'd prefer a more elegent solution.
I hope the question is clear, found a lot of topics concerning timer precision but no real answer to that problem.
You can do what you want from user space
use clock_gettime() with CLOCK_REALTIME to get the time-of-day with nano-second resolution
use nanosleep() to yield the CPU until you are close to the time you need to execute your task (it is at least milli-second resolution).
use a spin loop with clock_gettime() until you reach the desired time
execute your task
The clock_gettime() function is implemented as a VDSO in recent kernels and modern x86 processors - it takes 20-30 nanoseconds to get the time-of-day with nano-second resolution - you should be able to call clock_gettime() over 30 times per micro-second. Using this method your task should dispatch within 1/30th of a micro-second of the intended time.
The default Linux kernel timer ticks each millisecond. Microseconds is way beyond anything current user hardware is capable of.
The jitter you see is due to a host of factors, like interrupt handling and servicing higher priority tasks. You can cut that down somewhat by selecting hardware carefully, only enabling what is really needed. The real-time patchseries to the kernel (see the HOWTO) might be an option to reduce it a bit further.
Always keep in mind that any gain has a definite cost in terms of interactiveness, stability, and (last, but by far not least) your time in building, tuning, troubleshooting, and keeping the house of cards from falling apart.

How NOHZ=ON affects do_timer() in Linux kernel?

In a simple experiment I set NOHZ=OFF and used printk() to print how often the do_timer() function gets called. It gets called every 10 ms on my machine.
However if NOHZ=ON then there is a lot of jitter in the way do_timer() gets called. Most of the times it does get called every 10 ms but there are times when it completely misses the deadlines.
I have researched about both do_timer() and NOHZ. do_timer() is the function responsible for updating jiffies value and is also responsible for the round robin scheduling of the processes.
NOHZ feature switches off the hi-res timers on the system.
What I am unable to understand is how can hi-res timers affect the do_timer()? Even if hi-res hardware is in sleep state the persistent clock is more than capable to execute do_timer() every 10 ms. Secondly if do_timer() is not executing when it should, that means some processes are not getting their timeshare when they should ideally be getting it. A lot of googling does show that for many people many applications start working much better when NOHZ=OFF.
To make long story short, how does NOHZ=ON affect do_timer()?
Why does do_timer() miss its deadlines?
First lets understand what is a tickless kernel ( NOHZ=On or CONFIG_NO_HZ set ) and what was the motivation of introducing it into the Linux Kernel from 2.6.17
From http://www.lesswatts.org/projects/tickless/index.php,
Traditionally, the Linux kernel used a periodic timer for each CPU.
This timer did a variety of things, such as process accounting,
scheduler load balancing, and maintaining per-CPU timer events. Older
Linux kernels used a timer with a frequency of 100Hz (100 timer events
per second or one event every 10ms), while newer kernels use 250Hz
(250 events per second or one event every 4ms) or 1000Hz (1000 events
per second or one event every 1ms).
This periodic timer event is often called "the timer tick". The timer
tick is simple in its design, but has a significant drawback: the
timer tick happens periodically, irrespective of the processor state,
whether it's idle or busy. If the processor is idle, it has to wake up
from its power saving sleep state every 1, 4, or 10 milliseconds. This
costs quite a bit of energy, consuming battery life in laptops and
causing unnecessary power consumption in servers.
With "tickless idle", the Linux kernel has eliminated this periodic
timer tick when the CPU is idle. This allows the CPU to remain in
power saving states for a longer period of time, reducing the overall
system power consumption.
So reducing power consumption was one of the main motivations of the tickless kernel. But as it goes, most of the times, Performance takes a hit with decreased power consumption. For desktop computers, performance is of utmost concern and hence you see that for most of them NOHZ=OFF works pretty well.
In Ingo Molnar's own words
The tickless kernel feature (CONFIG_NO_HZ) enables 'on-demand' timer
interrupts: if there is no timer to be expired for say 1.5 seconds
when the system goes idle, then the system will stay totally idle for
1.5 seconds. This should bring cooler CPUs and power savings: on our (x86) testboxes we have measured the effective IRQ rate to go from HZ
to 1-2 timer interrupts per second.
Now, lets try to answer your queries-
What I am unable to understand is how can hi-res timers affect the
do_timer ?
If a system supports high-res timers, timer interrupts can occur more frequently than the usual 10ms on most systems. i.e these timers try to make the system more responsive by leveraging the system capabilities and by firing timer interrupts even faster, say every 100us. So with NOHZ option, these timers are cooled down and hence the lower execution of do_timer
Even if hi-res hardware is in sleep state the persistent clock is more
than capable to execute do_timer every 10ms
Yes it is capable. But the intention of NOHZ is exactly the opposite. To prevent frequent timer interrupts!
Secondly if do_timer is not executing when it should that means some
processes are not getting their timeshare when they should ideally be
getting it
As caf noted in the comments, NOHZ does not cause processes to get scheduled less often, because it only kicks in when the CPU is idle - in other words, when no processes are schedulable. Only the process accounting stuff will be done at a delayed time.
Why does do_timer miss it's deadlines ?
As elaborated, it is the intended design of NOHZ
I suggest you go through the tick-sched.c kernel sources as a starting point. Search for CONFIG_NO_HZ and try understanding the new functionality added for the NOHZ feature
Here is one test performed to measure the Impact of a Tickless Kernel

Incrementing Clocks

When a process is set to run with an initial time slice of 10 for example, someone in the hardware should know this initial timeslice and decrement it and when the time slice turns 0, an interrupt should be fired!
In freeBSD kernel, I understand that hardclock and the softclock does this task of accounting. But my question is, is this decrementing of clock parallel to the execution of the process?
I'll use the PIT as an example here, because it's the simplest timing mechanism (and has been around for quite a while).
Also, this answer is fairly x86-specific; and also OS-agnostic. I don't know enough about the internals of FreeBSD and Linux to answer for them specifically. Someone else might be more capable of that.
Essentially, the timeslice is "decremented" parallel to the execution of the process as the timer creates an IRQ for each "tick" (note that timers such as the HPET can do 'one-shot' mode, which fires an IRQ after a specific delay, which can be used for scheduling as well). Once the timeslice decrements to zero, the scheduler is notified and a task switch occurs. All this happens "at the same time" as your process: the IRQ jumps in, runs some code, then lets your process keep going until the timeslice runs out.
It should be noted that, generally speaking, you don't see a process running to the end of it's timeslice as task switches can occur as the direct result of a system call (for example, a read from disk that blocks, or even writing to a terminal).
This was simpler in the misty past: a clock chip -- a discrete device on the motherboard -- would be configured to fire interrupts periodically at a rate of X Hz. Every time this "timer interrupt" went off, execution of the current program would be suspended (just like any other interrupt) and the kernel's scheduler code would decrement its timeslice. When the timeslice got all the way to zero, the kernel would take the CPU away from the program and give it to another one. The clock chip, being separate from the CPU, obviously runs in parallel with the execution of the program, but the kernel's bookkeeping work has to interrupt the program (this is the misty past we're talking about, so there is only one CPU, so kernel code and user code cannot run simultaneously).
Nowadays, the clock is not a discrete device, it's part of the CPU, and it can be programmed to do all sorts of clever things. Most importantly it can be programmed to fire one interrupt after N microseconds, where N can be quite large; this allows the kernel to idle the CPU for a very long time (in computer terms; maybe, like, a whole second) if there's nothing constructive for it to do, saving power. Meanwhile, it's hard to find a single-core CPU anymore, kernels do all sorts of clever tricks to push their bookkeeping work off to CPUs that don't have anything better to do, and timeslice accounting has gotten a whole lot more complicated. Linux currently uses the "Completely Fair Scheduler" which doesn't even really have a concept of "time slices". I don't know what FreeBSD's got, but I would be surprised if it was simple.
So the short answer to your question is "mostly in parallel, more so now than in the past, but it's not remotely as simple as a countdown timer anymore".

sleep(0)? consistent time keeping in code?

Right now i am loading a file then using gettimeofday and tracking the CPU time with tv_usec
My results varies, i get 250's to 280s but sometimes 300's or 500's. I wrote usleep and sleep (0) and (1) with no success. The time still varies vastly. I thought sleep(1) (seconds in linux, not the windows Sleep in ms) would have solved it. How can i keep track of time in a more consistent way for testing? Maybe i should wait until i have a much larger test data and more complex code before starting measurements?
The currently recommended interface for high-rez time on Linux (and POSIX in general) is clock_gettime. See the man page.
clock_gettime(CLOCK_REALTIME, struct timespec *tp) // for wall-clock time
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, struct timespec *tp) // for CPU time
But read the man page. Note that you need to link with -lrt, because POSIX says so, I guess. Maybe to avoid symbol conflicts in -lc, for old programs that defined their own clock_gettime? But dynamic libs use weak symbols...
The best sleep function is nanosleep. It doesn't mess around with signals or any crap like usleep. It is defined to just sleep, and not have any other side effects. And it tells you if you woke up early (e.g. from signals), so you don't necessarily have to call another time function.
Anyway, you're going to have a hard time testing one rep of something that short that involves a system call. There's a huge amount of opportunity for variation. e.g. the scheduler may decide that some other work needs doing (unlikely if your process just started; you won't have used up your timeslice yet). CPU cache (L2 and TLB) are easily possible.
If you have a multi-core machine and a single-threaded benchmark for the code you're optimizing, you can give it realtime priority pinned to one of your cores. Make sure you choose the core that isn't handling interrupts, or your keyboard (and everything else) will be locked out until it's done. Use taskset (for pinning to one CPU) and chrt (for setting realtime prio).
See this mail I sent to gmp-devel with this trick:
http://gmplib.org/list-archives/gmp-devel/2008-March/000789.html
Oh yeah, for the most precise timing, you can use rdtsc yourself (on x86/amd64). If you don't have any other syscalls in what you're benching, it's not a bad idea. Grab a benchmarking framework to put your function into. GMP has a pretty decent one. It's maybe not set up well for benchmarking functions that aren't in GMP and called mpn_whatever, though. I don't remember, and it's worth a look.
Are you trying to measure how long it takes to load a file? Usually if you're performance testing some bit of code that is already pretty fast (sub-second), then you will want to repeat the same code a number of times (say a thousand or a million), time the whole lot, then divide the total time by the number of iterations.
Having said that, I'm not quite sure what you're using sleep() for. Can you post an example of what you intend to do?
I would recommend putting that code in a for loop. Run it over 1000 or 10000 iterations. There's problems with this if you're doing only a few instructions, but it should help.
Larger data sets also help of course.
sleep is going to deschedule your thread from the cpu. It does not accurately count time with precision.

Resources