Exactly when tasklet runs after it is schedule by ISR? - linux

I written my ISR and my tasklet ran immediately. BUT, I have seen people saying that tasklet runs only when it gets CPU attention. This is a very generic term CPU attention so i recite for those responders. I mean exactly which moment cpu attention goes to tasklet execution and what happen to the state of CPU ?
Secondly, if suppose that i am keep on getting hard interrupt then when will tasklet get chance to run? Is it possible that tasklet may not get chance to run? How does kernel take care these things ?

TL;DR: Tasklets are run by ksoftirq threads who are handled by Scheduler.
Tasklet is just a form of softirq (it is handled by them with TASKLET_SOFTIRQ priority), so rules on when running tasklets applies to them. Here they are according to Robert Love's book "Linux Kernel
Development":
In the return from hardware interrupt code path
In the ksoftirq kernel thread
In any code that explicitly checks for and executes pending softirqs, such as the networking subsystem
It seems that case (1) will not work if threadirqs=true (kernel boot parameter) which is default value.
UPD: Some notes on ksoftirq relation with Scheduler.
That is what seem to happen:
In hardirq handler you wake up ksoftirq (due to tasklet_schedule())
Thus wake_up_process() checks if ksoftirq may preempt current thread
If (2) is true TIF_NEED_RESCHED flag is set
On the return from hardirq (ret_from_intr - in x86) TIF_NEED_RESCHED flag is checked
If (4) is true, schedule() is called trying to pick next thread to be executed.
There is high chance that ksoftirq will be considered as preempt candidate in (2-3) and it will be picked in (5), but if there are competitors, ksoftirq have to wait till next schedule() cycle - current thread surrenders (i.e. sleeping), clock tick happens, syscall or new interrupt.

Related

How does the scheduler know that a thread is blocked waiting for input?

When a thread executing user code is waiting for input, how does the scheduler know to interrupt it or how does the thread know to call the scheduler, seeing as the average programmer of a simple single threaded application is unlikely to insert sched_yield() everywhere. Does the compiler insert sched_yield() on optimisation or does the thread just spin lock until the general timer interrupt set by the scheduler fires, or does the user have to explicitly state wait(), sleep() functions in order for the context to switch?
This question is especially relevant if the scheduler is not preemptive because then it has to call the scheduler when it is waiting for input for throughput to be effective, but I'm not sure how it does this.
Be careful not to confuse preemption with the ability of a process to sleep. Processes can sleep even with a non-preempting scheduler. This is what happens when a process is waiting for I/O. The process makes a system call such as read() and the device determines no data is available. It then internally puts the process to sleep by updating a data structure used by the scheduler. The scheduler then executes other processes until an interrupt or some other event occurs that wakes the original process. The awoken process then becomes eligible again for scheduling.
On the other hand preemption is the ability of an architecture's scheduler to stop execution of a process without its cooperation. The interruption can occur anywhere in the program's instruction stream. Control returns to the scheduler which can then execute other processes and return to the interrupted (preempted) process later. Most schedulers allocate time slices where a process is allowed to run for up to a predetermined amount of time, after which it is preempted if higher-priority processes need time slices.
Unless you're writing drivers or kernel code, you don't need to worry about the underlying mechanisms too much. When writing user-space applications the key concepts are (1) that some system calls may block which means your process is put to sleep until an event occurs, and (2) on preemptible systems (all mainstream modern operating systems) your program may be preempted at any time so that other processes can run.
* Note that in some platforms, such as Linux, a thread is really just another process which shares its virtual address space with another process. Processes and threads are therefore treated exactly the same by the scheduler.
It is not clear to me whether your question is about theory or practice. In practice in every modern operating system, i/o operations are privileged. Meaning that in order for a user process or thread to access files, devices and so on it must issue a system call.
Then the kernel has the opportunity to do whatever it considers appropriate. For example it can check whether the I/o operation will block and, therefore switch the running (i.e. “call” the scheduler) process after issuing the operation.
Note that this mechanism can work even when there is no timer interruption handled by the kernel. Anyway in general it will depend upon your system. For example in an embedded system where no OS exits (or a minimal one) it could be the entire responsibility of the user’s code to invoke the scheduler before issueing a blocking operation.
Kernel can be preemptive, not scheduler.
First sched_yield() and wait() are types of voluntary preemption, when process itself gives out CPU even if kernel is non-preemptive.
If kernel has ability to switch to another process when time quantum has expired or higher priority process become runnable then we are talking about involuntary preemption, i.e preemptive kernel, and it can happen on different places explained below.
Difference is that insched_yield() process stays in runnable TASK_RUNNING state but just goes to the end of the run queue for it's static priority. Process must wait to get the CPU again.
On the other hand, wait() puts process to a sleep TASK_(UN)INTERRUPTABLE state, on a wait queue, calls schedule() and waits for an event to occur. When event occur, process are moved to run queue again. But that doesn't mean that they will get CPU immediately.
Here is explained when schedule() can be called after process is woken up:
Wakeups don't really cause entry into schedule(). They add a
task to the run-queue and that's it.
If the new task added to the run-queue preempts the current
task, then the wakeup sets TIF_NEED_RESCHED and schedule() gets
called on the nearest possible occasion:
If the kernel is preemptible (CONFIG_PREEMPT=y):
in syscall or exception context, at the next outmost
preempt_enable(). (this might be as soon as the wake_up()'s
spin_unlock()!)
in IRQ context, return from interrupt-handler to
preemptible context
If the kernel is not preemptible (CONFIG_PREEMPT is not set)
then at the next:
cond_resched() call
explicit schedule() call
return from syscall or exception to user-space
return from interrupt-handler to user-space

what will happen to tasklet execution if interrupt occurs in between

The things I know about tasklet:
Tasklet runs with all interrupt enabled.
The tasklet runs in interrupt context.
It can't be sleep.
It runs in atomic way.
it has the assurance to be scheduled never late than next tick.
My questions:
Since in bottom half all interrupts are enabled, what happened If a tasklet is running and in between any interrupt comes. (If interrupts are disabled during tasklet execution then what is the benefit of tasklet)?
Why is the surety that tasklet will always be scheduled upto next tick?
Is it correct to say that tasklets are softirq with priority level 0(Hi priority tasklet) and priority level 6(Normal taslet)?
*Since in bottom half all interrupts are enabled, what happened If a tasklet is running and in between any interrupt comes. (If interrupts are disabled during tasklet execution then what is the benefit of tasklet)?*
From what i understand Tasklet (which is built on soft IRQ) runs in soft IRQ context which essentially means it runs in context of whatever process was running when the process was interrupted by Hard IRQ (so it is borrowing stack) , so an interrupt again would return back to tasklet execution.
*Is it correct to say that tasklets are softirq with priority level 0(Hi priority tasklet) and priority level 6(Normal taslet)?*
Yes tasklets are essentially wrappers built on Soft IRQ.

How is interrupt context "restored" when a interrupt handler is interrupted by another interrupt?

I read some related posts:
(1) From Robert Love: http://permalink.gmane.org/gmane.linux.kernel.kernelnewbies/1791
You cannot sleep in an interrupt handler because interrupts do not have a backing
process context, and thus there is nothing to reschedule back into. In other
words, interrupt handlers are not associated with a task, so there is nothing to
"put to sleep" and (more importantly) "nothing to wake up". They must run
atomically.
(2) From Which context are softirq and tasklet in?
If sleep is allowed, then the linux cannot schedule them and finally cause a
kernel panic with a dequeue_task error. The interrupt context does not even
have a data structure describing the register info, so they can never be scheduled
by linux. If it is designed to have that structure and can be scheduled, the
performance for interrupt handling process will be effected.
So in my understanding, interrupt handlers run in interrupt context, and can not sleep, that is to say, can not perform the context switch as normal processes do with backing mechanism.
But a interrupt handler can be interrupted by another interrupt. And when the second interrupt handler finishes its work, control flow would jump back to the first interrupt handler.
How is this "restoring" implemented without normal context switch? Is it like normal function calls with all the registers and other related stuff stored in a certain stack?
The short answer is that an interrupt handler, if it can be interrupted by an interrupt, is interrupted precisely the same way anything else is interrupted by an interrupt.
Say process X is running. If process X is interrupted, then the interrupt handler runs. To the extent there is a context, it's still process X, though it's now running interrupt code in the kernel (think of the state as X->interrupt if you like). If another interrupt occurs, then the interrupt is interrupted, but there is still no special process context. The state is now X->first_interrupt->second_interrupt. When the second interrupt finishes, the first interrupt will resume just as X will resume when the first interrupt finishes. Still, the only process context is process X.
You can describe these as context switches, but they aren't like process context switches. They're more analogous to entering and exiting the kernel -- the process context stays the same but the execution level and unit of code can change.
The interrupt routine will store some CPU state and registers before enter real interrupt handler, and will restore these information before returning to interrupted task. Normally, this kind of storing and restoring is not called context-switch, as the context of interrupted process is not changed.
As of 2020, interrupts (hard IRQ here) in Linux do not nest on a local CPU in general. This is at least mentioned twice by group/maintainer actively contributing to Linux kernel:
From NAPI updates written by Jakub Kicinski in 2020:
…Because normal interrupts don't nest in Linux, the system can't service any new interrupt while it's already processing one.
And from Bootlin in 2022:
…Interrupt handlers are run with all interrupts disabled on the local CPU…
So this question is probably less relevant nowadays, at least for Linux kernel.

Process Scheduling from Processor point of view

I understand that the scheduling is done by the kernel. Let us suppose a
process (P1) in Linux is currently executing on the processor.
Since the current process doesn't know anything about the time slice
and the kernel is currently not executing on the processor, how does the kernel schedule the next process to execute?
Is there some kind of interrupt to tell the processor to switch to execute the kernel or any other mechanism for the purpose?
In brief, it is an interrupt which gives control back to the kernel. The interrupt may appear due to any reason.
Most of the times the kernel gets control due to timer interrupt, or a key-press interrupt might wake-up the kernel.
Interrupt informing completion of IO with peripheral systems or virtually anything that changes the system state may
wake-up the kernel.
More about interrupts:
Interrupts as such are divided into top-half and bottom half. Bottom Halves are for deferring work from interrupt context.
Top-half: runs with interrupts disabled hence should be superfast, relinquish the CPU as soon as possible, usually
1) stores interrupt state flag and disables the interrupts(reset
some pin on the processor),
2) communicates with the hardware, stores state information,
delegates remaining responsibility to bottom-half,
3) restores the interrupt state flag and enables the interrupt((set
some pin on the processor).
Bottom-half: Handles the deferred work(delegated work by the top-half) runs with interrupts enabled hence may take a while before completion.
Two mechanisms are used to implement bottom-half processing.
1) Tasklets
2) Work queues
.
If timer is the interrupt to switch back to kernel, is the interrupt a hardware interrupt???
The timer interrupt of interest under our context of discussion is the hardware timer interrupt,
Inside kernel, the word timer interrupt may either mean (architecture-dependent) hardware timer interrupts or software timer interrupts.
Read this for a brief overview.
More about timers
Remeber "Timers" are an advanced topic, difficult to comprehend.
is the interrupt a hardware interrupt??? if it is a hardware
interrupt, what is the frequency of the timer?
Read Chapter 10. Timers and Time Management
if the interval of the timer is shorter than time slice, will kernel give the CPU back the same process, which was running early?
It depends upon many factors for ex: the sheduler being used, load on the system, process priorities, things like that.
The most popular CFS doesn't really depend upon the notion of time slice for preemption!
The next suitable process as picked up by CFS will get the CPU time.
The relation between timer ticks, time-slice and context switching is not so straight-forward.
Each process has its own (dynamically calculated) time slice. The kernel keeps track of the time slice used by the process.
On SMP, the CPU specific activities such as monitoring the execution time of the currently running process is done by the interrupts raised by the local APIC timer.
The local APIC timer sends an interrupt only to its processor.
However, the default time slice is defined in include/linux/sched/rt.h
Read this.
Few things could happen -
a. The current process (p1) can finish up its timeslice and then the
scheduler will check is there is any other process that could be run.
If there's no other process, the scheduler will put itself in the
idle state. The scheduler will assign p1 to the CPU if p1 is a CPU hoggy
task or p1 didn't leave the CPU voluntarily.
b. Another possibility is - a high priority task has jumped in. On every
scheduler tick, the scheduler will check if there's any process which
needs the CPU badly and is likely to preempt the current task.
In other words, a process can leave the CPU in two ways - voluntarily or involuntarily. In the first case, the process puts itself to sleep and therefore releases the CPU (case a). In the other case, a process has been preempted with a higher priority task.
(Note: This answer is based on the CFS task scheduler
of the current Linux kernel)

How does a kernel return from the thread

I am doing some study hardcore study on computers etc. so I can get started on my own mini Hello World OS.
I was looking a how kernels work and I was wondering how the kernel makes the current thread return to the kernel (so it can switch to another) even though the kernel isn't running and the thread has no instruction to do so.
Does it use some kind of CPU interrupt that goes back to the kernel after a few nanoseconds?
Does it use some kind of CPU interrupt that goes back to the kernel after a few nanoseconds?
It is during timer interrupts and (blocking) system calls that the kernel decides whether to keep executing the currently active thread(s) or switch to another thread. The timer interupt handler updates resource usages, such as consumed system and user time, for the currently running process and scheduler_tick() function that decides whether a process/tread need to be pre-empted.
See "Preemption and Context Switching" on page 62 of Linux Kernel Development book.
The kernel, however, must know when to call schedule(). If it called schedule() only
when code explicitly did so, user-space programs could run indefinitely. Instead, the kernel
provides the need_resched flag to signify whether a reschedule should be performed (see
Table 4.1).This flag is set by scheduler_tick() when a process should be preempted, and
by try_to_wake_up() when a process that has a higher priority than the currently run-
ning process is awakened.The kernel checks the flag, sees that it is set, and calls schedule() to switch to a new process.The flag is a message to the kernel that the scheduler should be invoked as soon as possible because another process deserves to run.
Does it use some kind of CPU interrupt
Yes! Modern preemptive kernels are absolutely dependent upon interrupts from hardware to deliver good I/O performance. Keyboard, mouse, disk, NIC, USB, etc. drivers are all entered from interrupts and can make threads that are waiting on them ready/running when required (e.g., when data is available).
Threads can also change state as a result of making an OS call that changes the caller's own state of that of another thread.
The interrupt from the hardware timer is one of many interrupt sources and is only special in that many system operations have timeouts that are signaled by this interrupt. Other than that, the timer interrupt just causes a reschedule which, in most cases, changes nothing re. the ready/running state of threads. If the machine is grossly CPU-overloaded to the point where there are more ready threads than there are cores, there is a side-effect of the timer interrupt that causes CPU time to be shared amongst the ready threads.
Do not fixate on the timer interrupt—the other driver interrupts are absolutely essential. It is not impossible to build a functional preemptive multithreaded kernel with no timer interrupt at all.

Resources