Which context a given function is called in Linux Kernel - linux

Is there a straight forward mechanism to identify if a given function is called in an interrupt context or from process context. This is the first part to the question. The second part is: How do I synchronize 2 processes, one which is in interrupt context and the other which is in process context. If my understanding is right, We cannot use mutexes for the process in interrupt context since it is not allowed to sleep. On the other hand, if I use spinlocks,the other process will use CPU cycles. What is the best way to synchronize these 2 processes. Correct me if my understanding is totally wrong.

You can tell if function was run as IRQ handler using in_irq() function. But I don't think it's a good practice to use it. You should see just from code in which context your function is being run. Otherwise I'd say your code has bad design.
As for synchronization mechanism -- you are right, you have to use spinlock, because you need to do synchronization in atomic context (e.g. interrupt) -- not that you have much of choice here. You are also right that much CPU cycles will be wasted when waiting for spinlock, so you should try and minimize amount of your code under lock.

Adding to Sam's answer - you should design your interrupt handler with bottom half and top half sections. This lets you have a minimal code (top half) in the interrupt handler (which you register when requesting the irq in the driver), and rest (bottom half) you can schedule using a work queue.
You can have this top half (where you are just handling the interrupt and doing some minimal red/writes from the device) inside atomic context protected by spinlock, so that less number of CPU cycles are wasted waiting for spinlock.

Related

Context switch between kernel threads vs user threads

Copy pasted from this link:
Thread switching does not require Kernel mode privileges.
User level threads are fast to create and manage.
Kernel threads are generally slower to create and manage than the user threads.
Transfer of control from one thread to another within the same process requires a mode switch to the Kernel.
I never came across these points while reading standard operating systems reference books. Though these points sound logical, I wanted to know how they reflect in Linux. To be precise :
Can someone give detailed steps involved in context switching between user threads and kernel threads, so that I can find the step difference between the two.
Can someone explain the difference with actual context switch example or code. May be system calls involved (in case of context switching between kernel threads) and thread library calls involved (in case of context switching between user threads).
Can someone link me to Linux source code line (say on github) handling context switch.
I also doubt why context switch between kernel threads requires changing to kernel mode. Aren't we already in kernel mode for first thread?
Can someone give detailed steps involved in context switching between user threads and kernel threads, so that I can find the step difference between the two.
Let's imagine a thread needs to read data from a file, but the file isn't cached in memory and disk drives are slow so the thread has to wait; and for simplicity let's also assume that the kernel is monolithic.
For kernel threading:
thread calls a "read()" function in a library or something; which must cause at least a switch to kernel code (because it's going to involve device drivers).
the kernel adds the IO request to the disk driver's "queue of possibly many pending requests"; realizes the thread will need to wait until the request completes, sets the thread to "blocked waiting for IO" and switches to a different thread (that may belong to a completely different process, depending on global thread priorities). The kernel returns to the user-space of whatever thread it switch to.
later; the disk hardware causes an IRQ which causes a switch back to the IRQ handler in kernel code. The disk driver finishes up the work it had to do the for (currently blocked) thread and unblocks that thread. At this point the kernel might decide to switch to the "now unblocked" thread; and the kernel returns to the user-space of the "now unblocked" thread.
For user threading:
thread calls a "read()" function in a library or something; which must cause at least a switch to kernel code (because it's going to involve device drivers).
the kernel adds the IO request to the disk driver's "queue of possibly many pending requests"; realizes the thread will need to wait until the request completes but can't take care of that because some fool decided to make everything worse by doing thread switching in user space, so the kernel returns to user-space with "IO request has been queued" status.
after the pointless extra overhead of switching back to user-space; the user-space scheduler does the thread switch that the kernel could have done. At this point the user-space scheduler will either tell kernel it has nothing to do and you'll have more pointless extra overhead switching back to kernel; or user-space scheduler will do a thread switch to another thread in the same process (which may be the wrong thread because a thread in a different process is higher priority).
later; the disk hardware causes an IRQ which causes a switch back to the IRQ handler in kernel code. The disk driver finishes up the work it had to do for the (currently blocked) thread; but the kernel isn't able to do the thread switch to unblock the thread because some fool decided to make everything worse by doing thread switching in user space. Now we've got a problem - how does kernel inform the user-space scheduler that the IO has finished? To solve this (without any "user-space scheduler running zero threads constantly polls kernel" insanity) you have to have some kind of "kernel puts notification of IO completion on some kind of queue and (if the process was idle) wakes the process up" which (on its own) will be more expensive than just doing the thread switch in the kernel. Of course if the process wasn't idle then code in user-space is going to have to poll its notification queue to find out if/when the "notification of IO completion" arrives, and that's going to increase latency and overhead. In any case, after lots of stupid pointless and avoidable overhead; the user-space scheduler can do the thread switch.
Can someone explain the difference with actual context switch example or code. May be system calls involved (in case of context switching between kernel threads) and thread library calls involved (in case of context switching between user threads).
The actual low-level context switch code typically begins with something like:
save whichever registers are "caller preserved" according to the calling conventions on the stack
save the current stack top in some kind of "thread info structure" belonging to the old thread
load a new stack top from some kind of "thread info structure" belonging to the new thread
pop whichever registers are "caller preserved" according to the calling conventions
return
However:
usually (for modern CPUs) there's a relatively large amount of "SIMD register state" (e.g. for 80x86 with support for AVX-512 I think it's over 4 KiB of of stuff). CPU manufacturers often have mechanisms to avoid saving parts of that state if it wasn't changed, and to (optionally) postpone the loading of (pieces of) that state until its actually used (and avoid it completely if its not actually used). All of that requires kernel.
if it's a task switch and not just used for thread switches you might need some kind of "if virtual address space needs to change { change virtual address space }" on top of that
normally you want to keep track of statistics, like how much CPU time a thread has used. This requires some kind of "thread_info.time_used += now() - time_at_last_thread_switch;"; which gets difficulty/ugly when "process switching" is separated from "thread switching".
normally there's other state (e.g. pointer to thread local storage, special registers for performance monitoring and/or debugging, ...) that may need to be saved/loaded during thread switches. Often this state is not directly accessible in user code.
normally you also want to set a timer to expire when the thread has used too much time; either because you're doing some kind of "time multiplexing" (e.g. round-robin scheduler) or because its a cooperating scheduler where you need to have some kind of "terminate this task after 5 seconds of not responding in case it goes into an infinite loop forever" safe-guard.
this is just the low level task/thread switching in isolation. There is almost always higher level code to select a task to switch to, handle "thread used too much CPU time", etc.
Can someone link me to Linux source code line (say on github) handling context switch
Someone probably can't. It's not one line; it's many lines of assembly for each different architecture, plus extra higher-level code (for timers, support routines, the "select a task to switch to" code, for exception handlers to support "lazy SIMD state load", ...); which probably all adds up to something like 10 thousand lines of code spread across 50 files.
I also doubt why context switch between kernel threads requires changing to kernel mode. Aren't we already in kernel mode for first thread?
Yes; often you're already in kernel code when you find out that a thread switch is needed.
Rarely/sometimes (mostly only due to communication between threads belonging to the same process - e.g. 2 or more threads in the same process trying to acquire the same mutex/semaphore at the same time; or threads sending data to each other and waiting for data from each other to arrive) kernel isn't involved; and in some cases (which are almost always massive design failures - e.g. extreme lock contention problems, failure to use "worker thread pools" to limit the number of threads needed, etc) it's possible for this to be the dominant cause of thread switches, and therefore possible that doing thread switches in user space can be beneficial (e.g. as a work-around for the massive design failures).
Don't limit yourself to Linux or even UNIX, they are neither the first nor last word on systems or programming models. The synchronous execution model dates back to the early days of computing, and are not particularly well suited to larger scale concurrent and reactive programming.
Golang, for example, employs a great many lightweight user threads -- goroutines -- and multiplexes them on a smaller set of heavyweight kernel threads to produce a more compelling concurrency paradigm. Some other programming systems take similar approaches.

Top halves and bottom halves concept clarification

As per the guide lines of Top halves and Bottom halves, When any interrupt comes it is handled by two halves. The so-called top half is the routine that actually responds to the interrupt—the one you register with request_irq. The bottom half is a routine that is scheduled by the top half to be executed later, at a safer time. The big difference between the top-half handler and the bottom half is that all interrupts are enabled during execution of the bottom half—that's why it runs at a safer time. In the typical scenario, the top half saves device data to a device-specific buffer, schedules its bottom half, and exits: this operation is very fast. The bottom half then performs whatever other work is required, such as awakening processes, starting up another I/O operation, and so on. This setup permits the top half to service a new interrupt while the bottom half is still working.
But is if the interrupt is handled in safer time by bottom halves then logically when interrupt comes it has to wait until bottom halve finds some safer time to execute interrupt that will limit the system and will have to wait until the interrupt handled, for example : if I am working on project to give LED blink indication when temperature goes high above specific limit in that case if interrupt handling is done when some safe time available(according to bottom halves concept) then blink operation will be delayed....Please clarify my doubt how all the interrupts are handled????
When top-half/bottom-half interrupt architecture is used, there is commonly a high-priority interrupt handling thread.
This interrupt handling thread has a higher priority than other threads in the system (some vendor SDKs specify an "Interrupt" priority level for this purpose). There is often a queue, and the thread sleeps when there is no work in the queue. This thread/queue is designed so that work can be safely added from an interrupt context.
When a top-half handler is called, it will handle the hardware operations and then add the bottom-half handler(s) to the interrupt queue. The top-half handler returns and interrupt context is exited. The OS will then check for threads that run next. Because the interrupt thread has work available, and because it is the highest priority, it will run next. This minimizes the latency that you are worried about.
There will naturally be some latency jitter, because there may be other interrupts in the queue that fired ahead of the LED (in your example). There are different solutions to this, depending on the application and real-time requirements. One is to have a sorted queue based on an interrupt priority level. This incurs additional cost when enqueueing operations, but it also ensures your interrupts will be handled by priority. The other option, for critical latencies, is to do all the work in the top-half interrupt handler.
It's important to keep in mind the purposes of such an architecture:
Minimize the time spent in interrupt context, because other interrupts are (likely) disabled while you are processing your current interrupt and you are increasing the latency for handling them
Prevent users from calling functions which are not safe to invoke from an interrupt context
We still want our bottom-half handlers to be run as soon as possible to reduce latency, so when you say "wait for a safer time", this means "outside of the interrupt context".
If you blink operation is too important to be delayed, you could put it into top half and may not have bottom half at all. However, depends on what you do in top half, it may or may not affect system performance.
I would suggest you write code for both cases and perform some profilings
Interrupt: An interrupt is an event that alter the sequence of instruction executed by a processor in corresponding to electrical signal generated by HW circuit both inside & outside CPU.
When any interrupt is generated it is handled by two halves.
1) Top Halves
2) Bottom Halves
Top Halves: Top halves executes as soon as CPU receives the interrupt. In the Top Halves Context Interrupt and Scheduler are disabled. This part of the code only contain Critical Code. Execution Time of this code should be as short as possible because at this time Interrupt are disabled, we don't want to miss out other interrupt by generated by the devices.
Bottom Halves: The Job of the Bottom half is used to run left over (deferred) work by the top halves. When this piece of code is being Executed interrupt is Enabled and Scheduler is Disabled. Bottom Halves are scheduled by Softirqs & Tasklets to run deferred work
Note: The top halves code should be as short as possible or deterministic time and should not contain any blocking calls as well.

suspendThread in windows

Keeping my question short... i am writing simulation for a RTOS. As usual the main problem comes with context switch simulation. In case of interrupts it is really becoming hard not to deviate from 'Good' coding guidelines.
Say Task A is running and user application is calculating its harmless private stuff which will run for a long time. during this task A, an interrupt X is supposed to occur. (hint: task A has nothing to do with triggering this interrupt X)... now how do i perform context switch from Task A to interrupt X handler?
My current implementation is based on a context thread that waits till some context switch is requested; an interrupt controller thread that can generate interrupts if someone request interrupt triggering; and a main thread that is running Task A. Now i use interrupt controller thread to generate a new thread for interrupt X and then request context thread to do the context switch. Context thread Suspends Task A main thread and resumes interrupt X handler thread. At the end of interrupt X handler thread, Task A main thread is resumed..
[Edit] just to clarify, i already know suspending and terminating threads from outside is really bad. That is why i asked this question. Plus please don't recommend using event etc. for controlling Task A. it is user application code and i can't control it. He can even use while(1){} if he wants...
I suspect that you can't do what you want to do in that way.
You mentioned that suspending a thread from outside is really bad. The reason is that you have no idea what the thread is doing when you suspend it. It's impossible to know whether the thread currently owns a mutex; if it does then any other thread that tries to access the same mutex is going to deadlock.
You have the problem that the runtime being used by the threads that might be suspended is the same as the one being used by the supervisor. That means there are many potential such deadlocks between the supervisor and the other threads.
In a real environment (i.e. not a simulator), the operating system kernel can suspend threads because there are checks in place to ensure that these deadlocks can't happen. I don't know the details, but it probably involves masking interrupts at certain critical points, and probably not sharing the same mutexes between user-mode code and critical parts of the kernel scheduler. (In your case that would mean your scheduler could not use any of the same OS API functions, either directly or indirectly, as are allowed to be used by the user threads, in case they involve mutexes. This of course would be virtually impossible to achieve.)
The reason I asked in a comment whether you have any control over the user code compiler is that if you controlled the compiler then you could arrange for the user code to effectively mask interrupts for the duration of each instruction and only yield to another thread at well-defined points between instructions. This is how it is done in a control system that I work on.
The other aspect is platform dependence. In Linux and other unix-like operating systems, you have signals, which are like user-mode interrupts. You could potentially use signals to emulate context switching, although you would still have the same problem with mutexes. There is absolutely no equivalent on Windows (as far as I know) precisely because of the problem already stated. The nearest thing is an asynchronous procedure call, but this will run only when the thread has put itself into an alertable wait state (which means the thread is in a deterministic state and is now safe to interrupt).
I think you are going to have to re-think the whole concept so that your supervisory thread has the sort of privileged control above the user threads that the OS has in a non-emulated environment. That will probably involve replacing the compiler or the run-time libraries, or both, with something of your own making.

Understanding how threads are implemented

I've been trying to understand the mechanics of implementing user-space threads. I'm not able to understand the mechanics of the stack and frames. I've come across two really great resources (here and here) that explain threading and how it's implemented but I still don't understand the following details:
How does the machine context used in the thread execution? I know it consists of a stack pointer and bunch of register values. But how exactly does the OS use it to execute the thread?
Why do we require a trampoline function (mctx_create_trampoline)? In the link #2, they setup a function as a signal handler that saves the machine context and starts the thread function (mctx_create_boot).
Based on these function how does one implement a "yield" function that the threads can call? Also, how can we interrupt a running thread? I assume you'd have a timer and SIGALRM that calls a signal handler when it goes off. But if the yield function switches contexts then the signal handler wont return which would block further signals from being delivered.
Once a thread has been set off executing on a physical CPU the OS is no longer involved until either the time-slice expires or some other re-scheduling needs to take place. So the key question is: How are threads scheduled to physical CPUs? Well, the OS sets the physical CPU registers to the appropriate values and jumps to the location where the thread was last interrupted (which in effect sets the instruction pointer). At this point the OS has lost control and is no longer involved. It can only regain control if a hardware interrupt occurs or some other physical CPU core decides to take control of the CPU.
Can't open that document right now.
"yield" cannot be implemented in user-space. It generally is a kernel API that picks some other thread to schedule and schedules it to the current CPU.

Context switch in Interrupt handlers

Why can't a context switch happen when an interrupt handler is executing ? More specifically, in the linux kernel, interrupt handlers run in the context of the process that was interrupted. Why is it not possible to do a context switch in the interrupt handler to schedule another process ?
On a multiprocessor, a context switch can certainly happen while an interrupt handler is executing. In fact, it would be difficult to prevent.
On a single-CPU machine, by definition it can only be running one thread of control at a time. It only has one register set, one ALU, etc. So if the interrupt handler is running there simply are no resources with which to execute a context switch.
Now, if you mean, can the interrupt handler actually call the context switch code and make one happen, well, I suppose on some systems that could be made to work. But for most, this wouldn't have much value and would be difficult to arrange. The CPU is running at elevated priority, and this priority cannot be lowered or synchronization between interrupt levels is lost. Critical sections in the OS are already synchronizing against interrupt execution and this would introduce complexities. Furthermore, a context switch happens by changing stacks, much like in a threaded user mode program, so it's hard to imagine how this might happen when the interrupt stack is needed for a return from the interrupt.
A couple of reasons, I guess, depending on the meaning of your question:
Q: Why would context switching during an interrupt be bad?
A: Interrupts are generally for interacting with hardware. Hardware is typically time-sensitive so the OS can't just stop dealing with it in the middle of something and come back when it feels like it.
Q: What stops a context switch from happening during an interrupt?
A: An interrupt happens in a special interrupt context, not a regular process context. Since it's not in a process, it's not subject to context switching as a normal process would be.
There's probably a better, deeper explanation to be made, but that's the extent of my own understanding of the matter.

Resources