In Linux kernel, why a mutex cannot be acquired in bottom half?

In Linux kernel, why a mutex cannot be acquired in bottom half? - linux

I am reading Linux Kernel Development and get confused by the differences between mutex and semaphore.
The author says:
A mutex cannot be acquired by an interrupt handler or bottom half
I know a mutex may lead to sleep, and interrupt handler is not running in any specific process context so mutex or semaphore is not allowed. But bottom half can be implemented with work queues and it can sleep.
So, why a mutex can't be acquired in bottom half? Is simplicity and efficiency concerned here or something else?

Mutex/semaphore locking can sleep, but BHs are designed not to sleep. Softirqs are asynchronously checked for execution in many places. For example they can execute every time you are restoring BH (like spin_unlock_bh). Causing such code to sleep on a mutex is a very bad idea. If you sleep while holding a BH spinlock you can cause other BH code to sleep and perhaps even deadlock the entire system.
From this point of view workqueues are not considered BH, they run in the context of kernel threads which are free to sleep. So a mutex is OK for workqueues but not for tasklets.
BH is a vague term, I find it helpful to think of the linux kernel as having three execution contexts: user (including kernel threads), softirq and hardirq. Preempting by each of these can be controlled with a set of bits in preempt_count.

The main motive for creating a mutex is simplicity and efficiency. As synchronization in bottom halves can be complicated, it is suggested that mutex be avoided in bottom halves. The design of bottom halves is not suitable for mutex. Eg. Mutex should be locked/unlocked in the same context - this would be hard to follow in case of bottom halves.
In theory you can decide to implement the whole interrupt handling different in which use of mutex in justified. Like the "threaded" interrupt handlers. http://lwn.net/Articles/380931/

Related

What does it mean by code holding semaphore can be preempted

I was going through the Robert Love Book and was bit confused about this line. What does it mean by code holding semaphore can be preempted?
If an interrupt occurs accessing the same variable which the user-space application has while it is executing the code in critical section then the user-space application can be preempted?
If my above understanding is true then there is no other alternative than spin-locks to disable an interrupt whenever an user-space application is in critical section?
So what is the use of semaphore in the context of OS? Interrupts might occur anytime when the user application is in critical section and in-order to avoid interrupt intervention we need to use spin-locks all the time.

What does it mean by code holding semaphore can be preempted?
It means that a process that is currently running in its critical section holding some lock for the purpose of synchronization can be preempted. Ideally interrupts have the highest
priority, so unless you disable the interrupt on that processor core, the running process
can be preempted and that might happen while the process is in its critical section.
While there are multiple spin_lock_XXX apis to disable interrupts, you might want to use
the spin_lock_irqsave as it saves the interrupt flags on that core and restores them while releasing the lock.

Is a spinlock lock free?

I am a little bit confused about the two concepts.
definition of lock-free on wiki:
A non-blocking algorithm is lock-free if there is guaranteed
system-wide progress
definition of non-blocking:
an algorithm is called non-blocking if failure or suspension of any
thread cannot cause failure or suspension of another thread
I thought spinlock is lock-free, or at least non-blocking. But now I'm not sure. Because by definition, "spinlock is not lock-free" also makes sense to me. Like, if the thread holding the spinlock gets suspended, then it will cause suspension of other threads spinning outside. So, by definition, spinlock is not even non-blocking, let alone lock-free.
I'm so confused now. Can anyone explain it clearly?

Anything that can be called a lock (exclude other threads from a critical section until the current thread unlocks) is by definition not lock-free. And yes, spinlocks are a kind of lock.
If a thread sleeps while holding the lock, no other thread can acquire it and make forward progress, and spinlocks can't prevent this. The OS can de-schedule a thread whenever it wants, even if it's in the middle of a critical section.
Note that "lock-free" isn't the same thing as "wait-free", so a lock-free algorithm can still have stuff like cmpxchg retry loops, but as long as one thread succeeds every time, it's lock free.
A wait-free algorithm can't even have that, and at most has to wait for cache misses / hardware arbitration of contended atomic operations. Wikipedia's non-blocking algorithm article defines wait-free and lock-free in more detail.
I think you're mixing up two definitions of "blocking".
I think you're talking about a spin_trylock function that tries to acquire a spinlock, and returns with an error if it fails instead of spinning. So this is non-blocking in the same sense as non-blocking I/O: fail with an error instead of waiting for resource availability.
That doesn't mean any thread in the system is making forward progress on the thing protected by the spinlock. It just means your thread can go and do something else before trying again, instead of needing to use separate threads to do something in parallel with waiting to acquire a lock.
Spinning in an infinite loop counts as blocking / not-making-progress. For this definition, there's no difference between a pure spinlock and one that (with OS assistance) sleeps until another thread unlocks.
The definition of lock-free isn't concerned with wasting CPU time / power to make room for independent work to happen.
Somewhat related: acquiring an uncontended spinlock doesn't require a system call, which means it's a "light-weight" lock. Some lock implementations always use a (relatively slow) system call even in the uncontended case. See Jeff Preshing's Always Use a Lightweight Mutex article. Also read Jeff's other posts to learn more about lock-free programming, because they're excellent. So good in fact that the [lock-free] tag wiki links to them.

How spinlocked threads avoid overhead of context switching?

In wikipedia, the aritcle about spinlocks.
Because they avoid overhead from operating system process rescheduling or context switching, spinlocks are efficient if threads are likely to be blocked for only short periods.
I actually can't grasp this sentence.
I think that even if a thread has a spinlock, it's going to be rescheduled, am I wrong ?
The context switching over-head - which is saving the registers,pc & scheduling queue - is constant for all switches, isn't it?

I actually can't grasp this sentence. I think that even if a thread
has a spinlock, it's going to be rescheduled, am I wrong ?
Eventually it would be... when its timeslice expired.
What a spinlock avoids is the chance of having the thread get context-switched out immediately whenever it tries to acquire and the lock is already locked by another thread.
(In the traditional mutex case, when the mutex is already locked, the thread would immediately be put to sleep aka context-switched out, and it would not be reawoken until after the other thread had unlocked the mutex. In spinlock case, OTOH, the thread would just keep checking the spinlock's state in a tight loop, until the spinlock was no longer locked, and then the thread would lock the spinlock for itself. Note that at no point during that process would the thread ask the kernel to put the thread to sleep, although if it spun for a long time its possible the kernel would do anyway... but a program using spinlocks will hopefully never lock them for a long time anyway, since spinning is really inefficient)
The context switching over-head - which is saving the registers,pc &
scheduling queue - is constant for all switches, isn't it?
Yes, I believe it is.

Generally an OS is only going to use spinlocks in interrupt service routines. These are designed to be of short duration.
I actually can't grasp this sentence. I think that even if a thread has a spinlock, it's going to be rescheduled, am I wrong ?
Not while it is handling an interrupt (simplifying here that only there is only one IPL). That interrupt might be the timer interrupt where the a context switch may take place. However, in that situation, the spinlock wait would be for the resources necessary to process a context switch.

Which context a given function is called in Linux Kernel

Is there a straight forward mechanism to identify if a given function is called in an interrupt context or from process context. This is the first part to the question. The second part is: How do I synchronize 2 processes, one which is in interrupt context and the other which is in process context. If my understanding is right, We cannot use mutexes for the process in interrupt context since it is not allowed to sleep. On the other hand, if I use spinlocks,the other process will use CPU cycles. What is the best way to synchronize these 2 processes. Correct me if my understanding is totally wrong.

You can tell if function was run as IRQ handler using in_irq() function. But I don't think it's a good practice to use it. You should see just from code in which context your function is being run. Otherwise I'd say your code has bad design.
As for synchronization mechanism -- you are right, you have to use spinlock, because you need to do synchronization in atomic context (e.g. interrupt) -- not that you have much of choice here. You are also right that much CPU cycles will be wasted when waiting for spinlock, so you should try and minimize amount of your code under lock.

Adding to Sam's answer - you should design your interrupt handler with bottom half and top half sections. This lets you have a minimal code (top half) in the interrupt handler (which you register when requesting the irq in the driver), and rest (bottom half) you can schedule using a work queue.
You can have this top half (where you are just handling the interrupt and doing some minimal red/writes from the device) inside atomic context protected by spinlock, so that less number of CPU cycles are wasted waiting for spinlock.

suspendThread in windows

Keeping my question short... i am writing simulation for a RTOS. As usual the main problem comes with context switch simulation. In case of interrupts it is really becoming hard not to deviate from 'Good' coding guidelines.
Say Task A is running and user application is calculating its harmless private stuff which will run for a long time. during this task A, an interrupt X is supposed to occur. (hint: task A has nothing to do with triggering this interrupt X)... now how do i perform context switch from Task A to interrupt X handler?
My current implementation is based on a context thread that waits till some context switch is requested; an interrupt controller thread that can generate interrupts if someone request interrupt triggering; and a main thread that is running Task A. Now i use interrupt controller thread to generate a new thread for interrupt X and then request context thread to do the context switch. Context thread Suspends Task A main thread and resumes interrupt X handler thread. At the end of interrupt X handler thread, Task A main thread is resumed..
[Edit] just to clarify, i already know suspending and terminating threads from outside is really bad. That is why i asked this question. Plus please don't recommend using event etc. for controlling Task A. it is user application code and i can't control it. He can even use while(1){} if he wants...

I suspect that you can't do what you want to do in that way.
You mentioned that suspending a thread from outside is really bad. The reason is that you have no idea what the thread is doing when you suspend it. It's impossible to know whether the thread currently owns a mutex; if it does then any other thread that tries to access the same mutex is going to deadlock.
You have the problem that the runtime being used by the threads that might be suspended is the same as the one being used by the supervisor. That means there are many potential such deadlocks between the supervisor and the other threads.
In a real environment (i.e. not a simulator), the operating system kernel can suspend threads because there are checks in place to ensure that these deadlocks can't happen. I don't know the details, but it probably involves masking interrupts at certain critical points, and probably not sharing the same mutexes between user-mode code and critical parts of the kernel scheduler. (In your case that would mean your scheduler could not use any of the same OS API functions, either directly or indirectly, as are allowed to be used by the user threads, in case they involve mutexes. This of course would be virtually impossible to achieve.)
The reason I asked in a comment whether you have any control over the user code compiler is that if you controlled the compiler then you could arrange for the user code to effectively mask interrupts for the duration of each instruction and only yield to another thread at well-defined points between instructions. This is how it is done in a control system that I work on.
The other aspect is platform dependence. In Linux and other unix-like operating systems, you have signals, which are like user-mode interrupts. You could potentially use signals to emulate context switching, although you would still have the same problem with mutexes. There is absolutely no equivalent on Windows (as far as I know) precisely because of the problem already stated. The nearest thing is an asynchronous procedure call, but this will run only when the thread has put itself into an alertable wait state (which means the thread is in a deterministic state and is now safe to interrupt).
I think you are going to have to re-think the whole concept so that your supervisory thread has the sort of privileged control above the user threads that the OS has in a non-emulated environment. That will probably involve replacing the compiler or the run-time libraries, or both, with something of your own making.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string