linux high resolution timer in user space - linux

I need a thread in my process to wakeup every 5ms(precise) and do some work.
I have used posix timers, they seems to be accurate 90% and accuracy further decreases when cpu is somewhat loaded.
I believe that is because posix timer have to fork new thread on every expiry.
Is there some other reliable way to implement high resolution timer in linux and will increasing priority of thread help?
I am on CentOS 5.6.

The POSIX timers (created with timer_create()) are already high-resolution. Your problem is in the delivery method - if you want very precise timing then SIGEV_THREAD is not a good idea.
You could instead use SIGEV_SIGNAL so that timer expiry is notified via a signal, then use sigwaitinfo() to wait for it to expire. Alternately, you could use a timerfd instead of a POSIX timer (created with timerfd_create()).
Additionally, if you want your thread to preempt other running threads when the timer expires, you'll need to give it a real-time scheduling policy (SCHED_FIFO or SCHED_RR) with sched_setscheduler().
You will also want to ensure that your kernel is compiled with the CONFIG_PREEMPT option, which allows most kernel code to be preemptable. There will still be some level of jitter, caused by non-preemptible kernel work like hardware interrupts and softirqs. To reduce this further, you can try using the CONFIG_PREEMPT_RT kernel patchset.

Related

what is hrtick_clear(rq); in linux scheduler?

while going through linux kernel code inside __scheduler() function I saw hrtick_clear(rq).
Can anyone explain what is this and why it is used?
it seems something related to timer, but unable to proceed further.
Classic OS design involves system timer - an entity that ticks at fixed intervals. During each tick, scheduler is called and if process/thread should be switched. But system timer frequency is pretty low (i.e. 1000 HZ, which means once in 1 ms), and if process have only 100us of its timeslice left, it will get extra time (under certain circumstances), while other processes are starve.
However, modern CPUs provide more precision hardware timers like HPET on Intel, which are provided by hrtimers subsystem. They can be enabled for be used in scheduler by CONFIG_SCHED_HRTICK option.
But if you already called __schedule() (i.e. on path of system call), you do not need to call it second time from hrtimer, because you already scheduling, so before doing so, hrtick_clear disables that hrtimer.

Linux Threads and process - CPU affinity

I have few queries related to threads and Process scheduling.
When my process goes into sleep and wakes back, is it always that it will be scheduled on the same CPU that it got scheduled before?
When i create a thread from the process, Will it also be executed on the same CPU always? Even if other CPU's are free and sleeping.
I would like to know the mechanism in Linux in specific. Also i am creating the threads through pthread library. I am facing a random hangup issue which is always not reproducible. Need this information to proceed in the right direction.
On single processor/core systems
Yes
Yes
on multi processor/core systems
No.
No.
use taskset to retrieve or set a processes’s CPU affinity on multicore systems. Setting the CPU affinity to a specific processor/core will change the answers to
Yes
Yes
also for multicore systems.
From within an application you may use sched_setaffinity and/or sched_getaffinity to adjust the CPU affinity.
Edit: Additional details about how/when CPU swaps are managed with respect to cache disadvantages:
The Linux/SMP Scheduler: "... In order to achieve good system performance, Linux/SMP (2.4 kernel) adopts an empirical rule to solve the dilemma ..." Read the details in the linked reference, section The Linux/SMP Scheduler.
For the newer CFS (Completely Fair Scheduler) you'd look at sched_migration_cost. "...if the real runtime of the task is smaller than the values of this parameter then the scheduler assumes that it is still in the cache and tries to avoid moving the task to another CPU during the load balancing procedure ..." (e.g.: Completely Fair Scheduler and its tuning).
when process goes in to sleep and when it wake up ,it is not necessary that it will schedule on same cpu.if u have multiprocessor environment then according to scheduler policy it will schedule on any cpu.When process goes to sleep there are different reason ,it goes to sleep beacause it is waiting for io,any resource.When event will occurs it goes from waiting state to ready state.At that time which cpu will be free scheduler will schedule that process on free cpu.It is not necessary it will schedule on same cpu.
for extra information about scheduler open source code of scheduler in linux release tree path.

Process Scheduling from Processor point of view

I understand that the scheduling is done by the kernel. Let us suppose a
process (P1) in Linux is currently executing on the processor.
Since the current process doesn't know anything about the time slice
and the kernel is currently not executing on the processor, how does the kernel schedule the next process to execute?
Is there some kind of interrupt to tell the processor to switch to execute the kernel or any other mechanism for the purpose?
In brief, it is an interrupt which gives control back to the kernel. The interrupt may appear due to any reason.
Most of the times the kernel gets control due to timer interrupt, or a key-press interrupt might wake-up the kernel.
Interrupt informing completion of IO with peripheral systems or virtually anything that changes the system state may
wake-up the kernel.
More about interrupts:
Interrupts as such are divided into top-half and bottom half. Bottom Halves are for deferring work from interrupt context.
Top-half: runs with interrupts disabled hence should be superfast, relinquish the CPU as soon as possible, usually
1) stores interrupt state flag and disables the interrupts(reset
some pin on the processor),
2) communicates with the hardware, stores state information,
delegates remaining responsibility to bottom-half,
3) restores the interrupt state flag and enables the interrupt((set
some pin on the processor).
Bottom-half: Handles the deferred work(delegated work by the top-half) runs with interrupts enabled hence may take a while before completion.
Two mechanisms are used to implement bottom-half processing.
1) Tasklets
2) Work queues
.
If timer is the interrupt to switch back to kernel, is the interrupt a hardware interrupt???
The timer interrupt of interest under our context of discussion is the hardware timer interrupt,
Inside kernel, the word timer interrupt may either mean (architecture-dependent) hardware timer interrupts or software timer interrupts.
Read this for a brief overview.
More about timers
Remeber "Timers" are an advanced topic, difficult to comprehend.
is the interrupt a hardware interrupt??? if it is a hardware
interrupt, what is the frequency of the timer?
Read Chapter 10. Timers and Time Management
if the interval of the timer is shorter than time slice, will kernel give the CPU back the same process, which was running early?
It depends upon many factors for ex: the sheduler being used, load on the system, process priorities, things like that.
The most popular CFS doesn't really depend upon the notion of time slice for preemption!
The next suitable process as picked up by CFS will get the CPU time.
The relation between timer ticks, time-slice and context switching is not so straight-forward.
Each process has its own (dynamically calculated) time slice. The kernel keeps track of the time slice used by the process.
On SMP, the CPU specific activities such as monitoring the execution time of the currently running process is done by the interrupts raised by the local APIC timer.
The local APIC timer sends an interrupt only to its processor.
However, the default time slice is defined in include/linux/sched/rt.h
Read this.
Few things could happen -
a. The current process (p1) can finish up its timeslice and then the
scheduler will check is there is any other process that could be run.
If there's no other process, the scheduler will put itself in the
idle state. The scheduler will assign p1 to the CPU if p1 is a CPU hoggy
task or p1 didn't leave the CPU voluntarily.
b. Another possibility is - a high priority task has jumped in. On every
scheduler tick, the scheduler will check if there's any process which
needs the CPU badly and is likely to preempt the current task.
In other words, a process can leave the CPU in two ways - voluntarily or involuntarily. In the first case, the process puts itself to sleep and therefore releases the CPU (case a). In the other case, a process has been preempted with a higher priority task.
(Note: This answer is based on the CFS task scheduler
of the current Linux kernel)

How does the OS scheduler regain control of CPU?

I recently started to learn how the CPU and the operating system works, and I am a bit confused about the operation of a single-CPU machine with an operating system that provides multitasking.
Supposing my machine has a single CPU, this would mean that, at any given time, only one process could be running.
Now, I can only assume that the scheduler used by the operating system to control the access to the precious CPU time is also a process.
Thus, in this machine, either the user process or the scheduling system process is running at any given point in time, but not both.
So here's a question:
Once the scheduler gives up control of the CPU to another process, how can it regain CPU time to run itself again to do its scheduling work? I mean, if any given process currently running does not yield the CPU, how could the scheduler itself ever run again and ensure proper multitasking?
So far, I had been thinking, well, if the user process requests an I/O operation through a system call, then in the system call we could ensure the scheduler is allocated some CPU time again. But I am not even sure if this works in this way.
On the other hand, if the user process in question were inherently CPU-bound, then, from this point of view, it could run forever, never letting other processes, not even the scheduler run again.
Supposing time-sliced scheduling, I have no idea how the scheduler could slice the time for the execution of another process when it is not even running?
I would really appreciate any insight or references that you can provide in this regard.
The OS sets up a hardware timer (Programmable interval timer or PIT) that generates an interrupt every N milliseconds. That interrupt is delivered to the kernel and user-code is interrupted.
It works like any other hardware interrupt. For example your disk will force a switch to the kernel when it has completed an IO.
Google "interrupts". Interrupts are at the centre of multithreading, preemptive kernels like Linux/Windows. With no interrupts, the OS will never do anything.
While investigating/learning, try to ignore any explanations that mention "timer interrupt", "round-robin" and "time-slice", or "quantum" in the first paragraph – they are dangerously misleading, if not actually wrong.
Interrupts, in OS terms, come in two flavours:
Hardware interrupts – those initiated by an actual hardware signal from a peripheral device. These can happen at (nearly) any time and switch execution from whatever thread might be running to code in a driver.
Software interrupts – those initiated by OS calls from currently running threads.
Either interrupt may request the scheduler to make threads that were waiting ready/running or cause threads that were waiting/running to be preempted.
The most important interrupts are those hardware interrupts from peripheral drivers – those that make threads ready that were waiting on IO from disks, NIC cards, mice, keyboards, USB etc. The overriding reason for using preemptive kernels, and all the problems of locking, synchronization, signaling etc., is that such systems have very good IO performance because hardware peripherals can rapidly make threads ready/running that were waiting for data from that hardware, without any latency resulting from threads that do not yield, or waiting for a periodic timer reschedule.
The hardware timer interrupt that causes periodic scheduling runs is important because many system calls have timeouts in case, say, a response from a peripheral takes longer than it should.
On multicore systems the OS has an interprocessor driver that can cause a hardware interrupt on other cores, allowing the OS to interrupt/schedule/dispatch threads onto multiple cores.
On seriously overloaded boxes, or those running CPU-intensive apps (a small minority), the OS can use the periodic timer interrupts, and the resulting scheduling, to cycle through a set of ready threads that is larger than the number of available cores, and allow each a share of available CPU resources. On most systems this happens rarely and is of little importance.
Every time I see "quantum", "give up the remainder of their time-slice", "round-robin" and similar, I just cringe...
To complement #usr's answer, quoting from Understanding the Linux Kernel:
The schedule( ) Function
schedule( ) implements the scheduler. Its objective is to find a
process in the runqueue list and then assign the CPU to it. It is
invoked, directly or in a lazy way, by several kernel routines.
[...]
Lazy invocation
The scheduler can also be invoked in a lazy way by setting the
need_resched field of current [process] to 1. Since a check on the value of this
field is always made before resuming the execution of a User Mode
process (see the section "Returning from Interrupts and Exceptions" in
Chapter 4), schedule( ) will definitely be invoked at some close
future time.

How does a kernel return from the thread

I am doing some study hardcore study on computers etc. so I can get started on my own mini Hello World OS.
I was looking a how kernels work and I was wondering how the kernel makes the current thread return to the kernel (so it can switch to another) even though the kernel isn't running and the thread has no instruction to do so.
Does it use some kind of CPU interrupt that goes back to the kernel after a few nanoseconds?
Does it use some kind of CPU interrupt that goes back to the kernel after a few nanoseconds?
It is during timer interrupts and (blocking) system calls that the kernel decides whether to keep executing the currently active thread(s) or switch to another thread. The timer interupt handler updates resource usages, such as consumed system and user time, for the currently running process and scheduler_tick() function that decides whether a process/tread need to be pre-empted.
See "Preemption and Context Switching" on page 62 of Linux Kernel Development book.
The kernel, however, must know when to call schedule(). If it called schedule() only
when code explicitly did so, user-space programs could run indefinitely. Instead, the kernel
provides the need_resched flag to signify whether a reschedule should be performed (see
Table 4.1).This flag is set by scheduler_tick() when a process should be preempted, and
by try_to_wake_up() when a process that has a higher priority than the currently run-
ning process is awakened.The kernel checks the flag, sees that it is set, and calls schedule() to switch to a new process.The flag is a message to the kernel that the scheduler should be invoked as soon as possible because another process deserves to run.
Does it use some kind of CPU interrupt
Yes! Modern preemptive kernels are absolutely dependent upon interrupts from hardware to deliver good I/O performance. Keyboard, mouse, disk, NIC, USB, etc. drivers are all entered from interrupts and can make threads that are waiting on them ready/running when required (e.g., when data is available).
Threads can also change state as a result of making an OS call that changes the caller's own state of that of another thread.
The interrupt from the hardware timer is one of many interrupt sources and is only special in that many system operations have timeouts that are signaled by this interrupt. Other than that, the timer interrupt just causes a reschedule which, in most cases, changes nothing re. the ready/running state of threads. If the machine is grossly CPU-overloaded to the point where there are more ready threads than there are cores, there is a side-effect of the timer interrupt that causes CPU time to be shared amongst the ready threads.
Do not fixate on the timer interrupt—the other driver interrupts are absolutely essential. It is not impossible to build a functional preemptive multithreaded kernel with no timer interrupt at all.

Resources