Function calling bottom half of interrupt handler in linux - linux

In Linux, the handling of interrupt handler is divided into two components : top half, and bottom half.
From my understanding, the bottom-half of an interrupt handler can be handled in many ways : softirq, tasklet, work-queue, and timer-list.
I want to know which function(s) in the Linux kernel handle the schedule function of these bottom-halves.
Edit : I looked into the halndling of softirq's and tasklet's, and it seems that both of them are handled through the __do_softirq (http://lxr.linux.no/linux+v2.6.32.58/kernel/softirq.c#L207) function. However, I still see many paths inside the handler execution which pass through the schedule() function of the Linux kernel, and then show divergence. I am not able to explain these paths properly.
Some intuition for guiding you towards this function :
The scheduling of pending task (bottom-half) should be triggered by some event. Kernel events can either be a system call, or an interrupt.I think that the event which triggers a bottom half is an interrupt and not a system call.
As per my knowledge, this are the steps followed on arrival of an interrupt :
1. Interrupt arrives at core
2. Top half of the interrupt handler is run
3. Check the pending queue to see if there is a task which needs attention.
4. If there is any pending task, then execute it
I was going through the function list of all the OS handlers, and observed that the execution of many handlers passed through the schedule() function of the Linux kernel. Since this function is called so often from many interrupt handlers, I suppose that the bottom half of the interrupt handlers should be called from within this function only.
The schedule() function calls the post_schedule() function at the end. I tracked all the functions between this two function calls. There are many different function lists between them, raising suspicion that the bottom half functions must lie on the path from schedule() to post_schedule(). However, the sheer number of different MACROS and functions in the kernel is making it really difficult to pinpoint the function from where the scheduler jumps into the bottom half.

The top half of interrupt handler of the device driver must return IRQ_HANDLED, IRQ_WAKE_THREAD or IRQ_NONE to indicate to the interrupt sub-system that irq is handled or not. If IRQ_WAKE_THREAD is returned then the threaded bottom-half part of the interrupt handler is scheduled for execution. Normally bottom halves have higher priority over other normal kernel tasks. See https://lwn.net/Articles/302043/ for more details

Related

Can the same timer interrupt occur in parallel?

I implemented one timer interrupt handler in kernel module.
This timer interrupt handler requires about 1000us to run.
And I want this timer to trigger up every 10us.
(In doing so, I hope the same handler will be performed in parallel.)
(I know that this can create a tremendous amount of interrupt overhead, but I want to implement it for some testing.)
But this handler does not seem to run in parallel.
Timer interrupt seems to wait until the handler in progress is finished.
Can the same timer interrupt occur in parallel?
If not, is there a kernel mechanism that can run the same handler in parallel?
If the timer triggers every 10us, and requires 1000us (1ms) to complete, you would require 100 dedicated cpu's to barely keep up to the timers. The short answer is no, the interrupt system isn't going to support this. If an interrupt recursed, it would inevitably consume the interrupt handler stack.
Interrupts typically work by having a short bit of code be directly invoked when the interrupt asserts. If more work is to be done, this short bit would schedule a slower bit to follow on, and inhibit this source of interrupt. This is to minimize the latency caused by disparate devices seeking cpu attention. The slower bit, when it determines it has satiated the device request, can re-enable interrupts from this source.
[ In linux, the short bit is called the top half; the slower bit the bottom half. It is a bit confusing, because decades of kernel implementation pre-linux named it exactly the other way around. Best to avoid these terms. ]
One of many ways to get the effect you desire is to have this slow handler release a semaphore then re-enable the interrupt. You could then have an appropriate number of threads sit in a loop acquiring the semaphore then performing your task.

Top halves and bottom halves concept clarification

As per the guide lines of Top halves and Bottom halves, When any interrupt comes it is handled by two halves. The so-called top half is the routine that actually responds to the interrupt—the one you register with request_irq. The bottom half is a routine that is scheduled by the top half to be executed later, at a safer time. The big difference between the top-half handler and the bottom half is that all interrupts are enabled during execution of the bottom half—that's why it runs at a safer time. In the typical scenario, the top half saves device data to a device-specific buffer, schedules its bottom half, and exits: this operation is very fast. The bottom half then performs whatever other work is required, such as awakening processes, starting up another I/O operation, and so on. This setup permits the top half to service a new interrupt while the bottom half is still working.
But is if the interrupt is handled in safer time by bottom halves then logically when interrupt comes it has to wait until bottom halve finds some safer time to execute interrupt that will limit the system and will have to wait until the interrupt handled, for example : if I am working on project to give LED blink indication when temperature goes high above specific limit in that case if interrupt handling is done when some safe time available(according to bottom halves concept) then blink operation will be delayed....Please clarify my doubt how all the interrupts are handled????
When top-half/bottom-half interrupt architecture is used, there is commonly a high-priority interrupt handling thread.
This interrupt handling thread has a higher priority than other threads in the system (some vendor SDKs specify an "Interrupt" priority level for this purpose). There is often a queue, and the thread sleeps when there is no work in the queue. This thread/queue is designed so that work can be safely added from an interrupt context.
When a top-half handler is called, it will handle the hardware operations and then add the bottom-half handler(s) to the interrupt queue. The top-half handler returns and interrupt context is exited. The OS will then check for threads that run next. Because the interrupt thread has work available, and because it is the highest priority, it will run next. This minimizes the latency that you are worried about.
There will naturally be some latency jitter, because there may be other interrupts in the queue that fired ahead of the LED (in your example). There are different solutions to this, depending on the application and real-time requirements. One is to have a sorted queue based on an interrupt priority level. This incurs additional cost when enqueueing operations, but it also ensures your interrupts will be handled by priority. The other option, for critical latencies, is to do all the work in the top-half interrupt handler.
It's important to keep in mind the purposes of such an architecture:
Minimize the time spent in interrupt context, because other interrupts are (likely) disabled while you are processing your current interrupt and you are increasing the latency for handling them
Prevent users from calling functions which are not safe to invoke from an interrupt context
We still want our bottom-half handlers to be run as soon as possible to reduce latency, so when you say "wait for a safer time", this means "outside of the interrupt context".
If you blink operation is too important to be delayed, you could put it into top half and may not have bottom half at all. However, depends on what you do in top half, it may or may not affect system performance.
I would suggest you write code for both cases and perform some profilings
Interrupt: An interrupt is an event that alter the sequence of instruction executed by a processor in corresponding to electrical signal generated by HW circuit both inside & outside CPU.
When any interrupt is generated it is handled by two halves.
1) Top Halves
2) Bottom Halves
Top Halves: Top halves executes as soon as CPU receives the interrupt. In the Top Halves Context Interrupt and Scheduler are disabled. This part of the code only contain Critical Code. Execution Time of this code should be as short as possible because at this time Interrupt are disabled, we don't want to miss out other interrupt by generated by the devices.
Bottom Halves: The Job of the Bottom half is used to run left over (deferred) work by the top halves. When this piece of code is being Executed interrupt is Enabled and Scheduler is Disabled. Bottom Halves are scheduled by Softirqs & Tasklets to run deferred work
Note: The top halves code should be as short as possible or deterministic time and should not contain any blocking calls as well.

Which context a given function is called in Linux Kernel

Is there a straight forward mechanism to identify if a given function is called in an interrupt context or from process context. This is the first part to the question. The second part is: How do I synchronize 2 processes, one which is in interrupt context and the other which is in process context. If my understanding is right, We cannot use mutexes for the process in interrupt context since it is not allowed to sleep. On the other hand, if I use spinlocks,the other process will use CPU cycles. What is the best way to synchronize these 2 processes. Correct me if my understanding is totally wrong.
You can tell if function was run as IRQ handler using in_irq() function. But I don't think it's a good practice to use it. You should see just from code in which context your function is being run. Otherwise I'd say your code has bad design.
As for synchronization mechanism -- you are right, you have to use spinlock, because you need to do synchronization in atomic context (e.g. interrupt) -- not that you have much of choice here. You are also right that much CPU cycles will be wasted when waiting for spinlock, so you should try and minimize amount of your code under lock.
Adding to Sam's answer - you should design your interrupt handler with bottom half and top half sections. This lets you have a minimal code (top half) in the interrupt handler (which you register when requesting the irq in the driver), and rest (bottom half) you can schedule using a work queue.
You can have this top half (where you are just handling the interrupt and doing some minimal red/writes from the device) inside atomic context protected by spinlock, so that less number of CPU cycles are wasted waiting for spinlock.

request_irq to be handled by a single CPU

I would like to ask if there is a way to register the interrupt handler so that only one cpu will handle this interrupt line.
The problem is that we have a function that can be called in both normal context and interrupt context. In this function we use irqs_disabled() to check the caller context. If the caller context is interrupt, we switch the processing to polling mode (continuously check the interrupt status register). Although the irqs_disabled() tells that the local interrupt of current CPU is disabled, the interrupt handler is still called by other CPUs and hence the interrupt status register are cleared in the interrupt handler. The polling code now checks the wrong value of the interrupt status register and do wrong processing.
You're doing it wrong. Don't limit your interrupt to be handled by a single CPU - instead use a spin_lock_irqsave to protect the code path. This will work both on the same CPU and across CPUs.
See http://www.mjmwired.net/kernel/Documentation/spinlocks.txt for the relevant API and here is a nice article from Linux Journal that explain the usage: http://www.linuxjournal.com/article/5833
I've got no experience with ARM, but on x86 you can arrange for a particular interrupt to be called on only one processor via /proc/irq/<number>/smp_affinity - set from user space - replacing the number with irq you care about - and this looks as if it's essentially generic. Note that the value you set it to is a bit mask, expressed in hex, without a leading 0x. I.e. if you want cpu 0, set it to 1, for cpu 1, set it to 2, etc. Beware of a process called irqbalance, which uses this mechanism, and might well override whatever you have done.
But why are you doing this? If you want to know whether you are called from an interrupt, there's an interface available named something like in_interrupt(). I've used it to avoid trying to call blocking functions from code that might be called from interrupt context.

when schedule() returns?

In case of blocking IO, say, driver read, we call wait_event_interruptible() with some condition. When the condition is met, read will be done.
I looked into wait_event_interruptible() function, it checks for condition and calls schedule(). schedule() will look for the next runnable process and does context switch and other process will run. Does it mean that, the next instruction to be executed for the current process will be inside schedule() function when this process is woken up again?
If yes, if multiple process voluntarily calls schedule, then all processes will have next instruction to be executed once after it gets woken up will be well inside schedule()?
In case of ret_from_interrupt, schedule() is called. When it will return? as iret is executed after that.
I think the answer to the first question is yes as that's a fairly typical way of implementing context switching. That's how OS161 works, for example.
If the scheduler is called from an ISR, everything should be the same. The scheduler should change the context and return to the ISR and the ISR should then return using IRET. It will return to a different process/thread if the scheduler chooses to switch to a different one and therefore loads its context and saves the old one.
Re point 2: The iret instruction (return from interrupt handler) is executed and that gets you into ret_from_interrupt. Then Linux passes control to the task next to run (schedule()). One of the overriding considerations when writing interrupt handlers is that while they are executing many other activities are inhibited (other, lower priority, interrupts are the prime example), so you want to get out of there as fast as possible. That is why most interrupt handlers just stash away work to be done before returning, and said work is then handled elsewhere (today in some special kernel thread).

Resources