Incrementing Clocks - linux

When a process is set to run with an initial time slice of 10 for example, someone in the hardware should know this initial timeslice and decrement it and when the time slice turns 0, an interrupt should be fired!
In freeBSD kernel, I understand that hardclock and the softclock does this task of accounting. But my question is, is this decrementing of clock parallel to the execution of the process?

I'll use the PIT as an example here, because it's the simplest timing mechanism (and has been around for quite a while).
Also, this answer is fairly x86-specific; and also OS-agnostic. I don't know enough about the internals of FreeBSD and Linux to answer for them specifically. Someone else might be more capable of that.
Essentially, the timeslice is "decremented" parallel to the execution of the process as the timer creates an IRQ for each "tick" (note that timers such as the HPET can do 'one-shot' mode, which fires an IRQ after a specific delay, which can be used for scheduling as well). Once the timeslice decrements to zero, the scheduler is notified and a task switch occurs. All this happens "at the same time" as your process: the IRQ jumps in, runs some code, then lets your process keep going until the timeslice runs out.
It should be noted that, generally speaking, you don't see a process running to the end of it's timeslice as task switches can occur as the direct result of a system call (for example, a read from disk that blocks, or even writing to a terminal).

This was simpler in the misty past: a clock chip -- a discrete device on the motherboard -- would be configured to fire interrupts periodically at a rate of X Hz. Every time this "timer interrupt" went off, execution of the current program would be suspended (just like any other interrupt) and the kernel's scheduler code would decrement its timeslice. When the timeslice got all the way to zero, the kernel would take the CPU away from the program and give it to another one. The clock chip, being separate from the CPU, obviously runs in parallel with the execution of the program, but the kernel's bookkeeping work has to interrupt the program (this is the misty past we're talking about, so there is only one CPU, so kernel code and user code cannot run simultaneously).
Nowadays, the clock is not a discrete device, it's part of the CPU, and it can be programmed to do all sorts of clever things. Most importantly it can be programmed to fire one interrupt after N microseconds, where N can be quite large; this allows the kernel to idle the CPU for a very long time (in computer terms; maybe, like, a whole second) if there's nothing constructive for it to do, saving power. Meanwhile, it's hard to find a single-core CPU anymore, kernels do all sorts of clever tricks to push their bookkeeping work off to CPUs that don't have anything better to do, and timeslice accounting has gotten a whole lot more complicated. Linux currently uses the "Completely Fair Scheduler" which doesn't even really have a concept of "time slices". I don't know what FreeBSD's got, but I would be surprised if it was simple.
So the short answer to your question is "mostly in parallel, more so now than in the past, but it's not remotely as simple as a countdown timer anymore".

Related

Thread sleeps longer then expected

I have this code:
let k = time::Instant::now();
thread::sleep(time::Duration::from_micros(10));
let elapsed = k.elapsed().as_micros();
println!("{}", elapsed);
My output is always somewhere between 70 and 90. I expect it to be 10, why is this number 7x higher?
This actually doesn't really have anything to do with Rust.
On a typical multi-processing, user-interactive operating system (i.e., every consumer OS you've used), your thread isn't special. It's one among many, and the CPUs need to be shared.
You operating system has a component called a scheduler, whose job it is to share the hardware resources. It will boot off your thread off the CPU quite often. This typically happens:
On every system call
Every time an interrupt hits the CPU
When the scheduler kicks you off to give other processes/threads a chance (this is called preemption, and typically happens 10s of times a second)
Thus, your userland process can't possibly do anything timing-related with such fine precision.
There's several solution paths you can explore:
Increase the amount of CPU your operating system gives you. Some ideas:
Increase the process' priortiy
Pin the thread to a particular CPU core, to give it exclusive use (this means you lose throughput, because if your thread is idle, no other thread's work can borrow that CPU)
Switch to a real-time operating system which makes guarantees about latency and timing.
Offload the work to some hardware that's specialized to do with, without the involvement of your process.
E.g. offload sine wave generation to a hardware sound-card, WiFi radio processing to a radio controller, etc.
Use your own micro controller to do the real-time stuff, and communicate to it over something like I2C or SPI.
In your case of running some simple code on a userland process, I think your easiest bet is to just pin your process. Your existing code will work as-is, you'll just lose the throughput of one of your cores (but luckily, you haven multiple).

How does the Computer understand time? There's no asm instruction for waiting

I have been learning about programming languages and there is one question which bothers me all the time.
For example let's say that I programmed something which allows me to push a button every 5 seconds.
How does the Computer understand the waiting part(allows to push the button - waits 5 seconds and allows again)?
I already know that first higher programming languages are getting compiled into machine code so that the computer can run it. But if we take assembler for instance, which is very near to machine code, just human readble, there is no instruction for waiting.
The example which I have given with the waiting is just one example, there are much more things which I do not understand how the computer understands ;)
Cpu has a quartz timer crystal inside called cpu clock. When a current pass through it, it gives a presice frequency for that current.The Cpu can then use that frequency to keep the track of time.
So computer can understand “do something, wait for 5 seconds and then continue again”
for more info on quartz timer: https://en.m.wikipedia.org/wiki/Crystal_oscillator
For short delays on simple CPUs (like microcontrollers) with a known fixed clock frequency, and no multitasking, and a simple one instruction per clock cycle design, you can wait in asm with a "delay loop". Here's the arduino source (for AVR microcontrollers) for an implementation:
https://github.com/arduino/ArduinoCore-avr/blob/master/cores/arduino/wiring.c#L120
As you can see, the behavior depends on the clock-frequency of the CPU. You wouldn't normally loop for 5 seconds, though (that's a long time to burn power). Computers normally have timer and clock chips that can be programmed to raise an interrupt at a specific time, so you can put the CPU to sleep and have it woken up on the next interrupt if there's nothing else to do. Delay loops are good (on microcontrollers) for very short delays, too short to sleep for or even to program a timer for.
You might want to get a little microcontroller-board (not necessarily arduino) to play around with. There you have way less "bloat" from an operating system or libraries and you're much closer to the hardware.

Why processes are deprived of CPU for TOO long while busy looping in Linux kernel?

At first glance, my question might look bit trivial. Please bear with me and read completely.
I have identified a busy loop in my Linux kernel module. Due to this, other processes (e.g. sshd) are not getting CPU time for long spans of time (like 20 seconds). This is understandable as my machine has only single CPU and busy loop is not giving chance to schedule other processes.
Just to experiment, I had added schedule() after each iteration in the busy loop. Even though, this would be keeping the CPU busy, it should still let other processes run as I am calling schedule(). But, this doesn't seem to be happening. My user level processes are still hanging for long spans of time (20 seconds).
In this case, the kernel thread got nice value -5 and user level threads got nice value 0. Even with low priority of user level thread, I think 20 seconds is too long to not get CPU.
Can someone please explain why this could be happening?
Note: I know how to remove busy loop completely. But, I want to understand the behaviour of kernel here. Kernel version is 2.6.18 and kernel pre-emption is disabled.
The schedule() function simply invokes the scheduler - it doesn't take any special measures to arrange that the calling thread will be replaced by a different one. If the current thread is still the highest priority one on the run queue then it will be selected by the scheduler once again.
It sounds as if your kernel thread is doing very little work in its busy loop and it's calling schedule() every time round. Therefore, it's probably not using much CPU time itself and hence doesn't have its priority reduced much. Negative nice values carry heavier weight than positives, so the difference between a -5 and a 0 is quite pronounced. The combination of these two effects means I'm not too surprised that user space processes miss out.
As an experiment you could try calling the scheduler every Nth iteration of the loop (you'll have to experiment to find a good value of N for your platform) and see if the situation is better - calling schedule() too often will just waste lots of CPU time in the scheduler. Of course, this is just an experiment - as you have already pointed out, avoiding busy loops is the correct option in production code, and if you want to be sure your thread is replaced by another then set it to be TASK_INTERRUPTIBLE before calling schedule() to remote itself from the run queue (as has already been mentioned in comments).
Note that your kernel (2.6.18) is using the O(1) scheduler which existed until the Completely Fair Scheduler was added in 2.6.23 (the O(1) scheduler having been added in 2.6 to replace the even older O(n) scheduler). The CFS doesn't use run queues and works in a different way, so you might well see different behaviour - I'm less familiar with it, however, so I wouldn't like to predict exactly what differences you'd see. I've seen enough of it to know that "completely fair" isn't the term I'd use on heavily loaded SMP systems with a large number of both cores and processes, but I also accept that writing a scheduler is a very tricky task and it's far from the worst I've seen, and I've never had a significant problem with it on a 4-8 core desktop machine.

How does the OS scheduler regain control of CPU?

I recently started to learn how the CPU and the operating system works, and I am a bit confused about the operation of a single-CPU machine with an operating system that provides multitasking.
Supposing my machine has a single CPU, this would mean that, at any given time, only one process could be running.
Now, I can only assume that the scheduler used by the operating system to control the access to the precious CPU time is also a process.
Thus, in this machine, either the user process or the scheduling system process is running at any given point in time, but not both.
So here's a question:
Once the scheduler gives up control of the CPU to another process, how can it regain CPU time to run itself again to do its scheduling work? I mean, if any given process currently running does not yield the CPU, how could the scheduler itself ever run again and ensure proper multitasking?
So far, I had been thinking, well, if the user process requests an I/O operation through a system call, then in the system call we could ensure the scheduler is allocated some CPU time again. But I am not even sure if this works in this way.
On the other hand, if the user process in question were inherently CPU-bound, then, from this point of view, it could run forever, never letting other processes, not even the scheduler run again.
Supposing time-sliced scheduling, I have no idea how the scheduler could slice the time for the execution of another process when it is not even running?
I would really appreciate any insight or references that you can provide in this regard.
The OS sets up a hardware timer (Programmable interval timer or PIT) that generates an interrupt every N milliseconds. That interrupt is delivered to the kernel and user-code is interrupted.
It works like any other hardware interrupt. For example your disk will force a switch to the kernel when it has completed an IO.
Google "interrupts". Interrupts are at the centre of multithreading, preemptive kernels like Linux/Windows. With no interrupts, the OS will never do anything.
While investigating/learning, try to ignore any explanations that mention "timer interrupt", "round-robin" and "time-slice", or "quantum" in the first paragraph – they are dangerously misleading, if not actually wrong.
Interrupts, in OS terms, come in two flavours:
Hardware interrupts – those initiated by an actual hardware signal from a peripheral device. These can happen at (nearly) any time and switch execution from whatever thread might be running to code in a driver.
Software interrupts – those initiated by OS calls from currently running threads.
Either interrupt may request the scheduler to make threads that were waiting ready/running or cause threads that were waiting/running to be preempted.
The most important interrupts are those hardware interrupts from peripheral drivers – those that make threads ready that were waiting on IO from disks, NIC cards, mice, keyboards, USB etc. The overriding reason for using preemptive kernels, and all the problems of locking, synchronization, signaling etc., is that such systems have very good IO performance because hardware peripherals can rapidly make threads ready/running that were waiting for data from that hardware, without any latency resulting from threads that do not yield, or waiting for a periodic timer reschedule.
The hardware timer interrupt that causes periodic scheduling runs is important because many system calls have timeouts in case, say, a response from a peripheral takes longer than it should.
On multicore systems the OS has an interprocessor driver that can cause a hardware interrupt on other cores, allowing the OS to interrupt/schedule/dispatch threads onto multiple cores.
On seriously overloaded boxes, or those running CPU-intensive apps (a small minority), the OS can use the periodic timer interrupts, and the resulting scheduling, to cycle through a set of ready threads that is larger than the number of available cores, and allow each a share of available CPU resources. On most systems this happens rarely and is of little importance.
Every time I see "quantum", "give up the remainder of their time-slice", "round-robin" and similar, I just cringe...
To complement #usr's answer, quoting from Understanding the Linux Kernel:
The schedule( ) Function
schedule( ) implements the scheduler. Its objective is to find a
process in the runqueue list and then assign the CPU to it. It is
invoked, directly or in a lazy way, by several kernel routines.
[...]
Lazy invocation
The scheduler can also be invoked in a lazy way by setting the
need_resched field of current [process] to 1. Since a check on the value of this
field is always made before resuming the execution of a User Mode
process (see the section "Returning from Interrupts and Exceptions" in
Chapter 4), schedule( ) will definitely be invoked at some close
future time.

How NOHZ=ON affects do_timer() in Linux kernel?

In a simple experiment I set NOHZ=OFF and used printk() to print how often the do_timer() function gets called. It gets called every 10 ms on my machine.
However if NOHZ=ON then there is a lot of jitter in the way do_timer() gets called. Most of the times it does get called every 10 ms but there are times when it completely misses the deadlines.
I have researched about both do_timer() and NOHZ. do_timer() is the function responsible for updating jiffies value and is also responsible for the round robin scheduling of the processes.
NOHZ feature switches off the hi-res timers on the system.
What I am unable to understand is how can hi-res timers affect the do_timer()? Even if hi-res hardware is in sleep state the persistent clock is more than capable to execute do_timer() every 10 ms. Secondly if do_timer() is not executing when it should, that means some processes are not getting their timeshare when they should ideally be getting it. A lot of googling does show that for many people many applications start working much better when NOHZ=OFF.
To make long story short, how does NOHZ=ON affect do_timer()?
Why does do_timer() miss its deadlines?
First lets understand what is a tickless kernel ( NOHZ=On or CONFIG_NO_HZ set ) and what was the motivation of introducing it into the Linux Kernel from 2.6.17
From http://www.lesswatts.org/projects/tickless/index.php,
Traditionally, the Linux kernel used a periodic timer for each CPU.
This timer did a variety of things, such as process accounting,
scheduler load balancing, and maintaining per-CPU timer events. Older
Linux kernels used a timer with a frequency of 100Hz (100 timer events
per second or one event every 10ms), while newer kernels use 250Hz
(250 events per second or one event every 4ms) or 1000Hz (1000 events
per second or one event every 1ms).
This periodic timer event is often called "the timer tick". The timer
tick is simple in its design, but has a significant drawback: the
timer tick happens periodically, irrespective of the processor state,
whether it's idle or busy. If the processor is idle, it has to wake up
from its power saving sleep state every 1, 4, or 10 milliseconds. This
costs quite a bit of energy, consuming battery life in laptops and
causing unnecessary power consumption in servers.
With "tickless idle", the Linux kernel has eliminated this periodic
timer tick when the CPU is idle. This allows the CPU to remain in
power saving states for a longer period of time, reducing the overall
system power consumption.
So reducing power consumption was one of the main motivations of the tickless kernel. But as it goes, most of the times, Performance takes a hit with decreased power consumption. For desktop computers, performance is of utmost concern and hence you see that for most of them NOHZ=OFF works pretty well.
In Ingo Molnar's own words
The tickless kernel feature (CONFIG_NO_HZ) enables 'on-demand' timer
interrupts: if there is no timer to be expired for say 1.5 seconds
when the system goes idle, then the system will stay totally idle for
1.5 seconds. This should bring cooler CPUs and power savings: on our (x86) testboxes we have measured the effective IRQ rate to go from HZ
to 1-2 timer interrupts per second.
Now, lets try to answer your queries-
What I am unable to understand is how can hi-res timers affect the
do_timer ?
If a system supports high-res timers, timer interrupts can occur more frequently than the usual 10ms on most systems. i.e these timers try to make the system more responsive by leveraging the system capabilities and by firing timer interrupts even faster, say every 100us. So with NOHZ option, these timers are cooled down and hence the lower execution of do_timer
Even if hi-res hardware is in sleep state the persistent clock is more
than capable to execute do_timer every 10ms
Yes it is capable. But the intention of NOHZ is exactly the opposite. To prevent frequent timer interrupts!
Secondly if do_timer is not executing when it should that means some
processes are not getting their timeshare when they should ideally be
getting it
As caf noted in the comments, NOHZ does not cause processes to get scheduled less often, because it only kicks in when the CPU is idle - in other words, when no processes are schedulable. Only the process accounting stuff will be done at a delayed time.
Why does do_timer miss it's deadlines ?
As elaborated, it is the intended design of NOHZ
I suggest you go through the tick-sched.c kernel sources as a starting point. Search for CONFIG_NO_HZ and try understanding the new functionality added for the NOHZ feature
Here is one test performed to measure the Impact of a Tickless Kernel

Resources