Top halves and bottom halves concept clarification - linux

As per the guide lines of Top halves and Bottom halves, When any interrupt comes it is handled by two halves. The so-called top half is the routine that actually responds to the interrupt—the one you register with request_irq. The bottom half is a routine that is scheduled by the top half to be executed later, at a safer time. The big difference between the top-half handler and the bottom half is that all interrupts are enabled during execution of the bottom half—that's why it runs at a safer time. In the typical scenario, the top half saves device data to a device-specific buffer, schedules its bottom half, and exits: this operation is very fast. The bottom half then performs whatever other work is required, such as awakening processes, starting up another I/O operation, and so on. This setup permits the top half to service a new interrupt while the bottom half is still working.
But is if the interrupt is handled in safer time by bottom halves then logically when interrupt comes it has to wait until bottom halve finds some safer time to execute interrupt that will limit the system and will have to wait until the interrupt handled, for example : if I am working on project to give LED blink indication when temperature goes high above specific limit in that case if interrupt handling is done when some safe time available(according to bottom halves concept) then blink operation will be delayed....Please clarify my doubt how all the interrupts are handled????

When top-half/bottom-half interrupt architecture is used, there is commonly a high-priority interrupt handling thread.
This interrupt handling thread has a higher priority than other threads in the system (some vendor SDKs specify an "Interrupt" priority level for this purpose). There is often a queue, and the thread sleeps when there is no work in the queue. This thread/queue is designed so that work can be safely added from an interrupt context.
When a top-half handler is called, it will handle the hardware operations and then add the bottom-half handler(s) to the interrupt queue. The top-half handler returns and interrupt context is exited. The OS will then check for threads that run next. Because the interrupt thread has work available, and because it is the highest priority, it will run next. This minimizes the latency that you are worried about.
There will naturally be some latency jitter, because there may be other interrupts in the queue that fired ahead of the LED (in your example). There are different solutions to this, depending on the application and real-time requirements. One is to have a sorted queue based on an interrupt priority level. This incurs additional cost when enqueueing operations, but it also ensures your interrupts will be handled by priority. The other option, for critical latencies, is to do all the work in the top-half interrupt handler.
It's important to keep in mind the purposes of such an architecture:
Minimize the time spent in interrupt context, because other interrupts are (likely) disabled while you are processing your current interrupt and you are increasing the latency for handling them
Prevent users from calling functions which are not safe to invoke from an interrupt context
We still want our bottom-half handlers to be run as soon as possible to reduce latency, so when you say "wait for a safer time", this means "outside of the interrupt context".

If you blink operation is too important to be delayed, you could put it into top half and may not have bottom half at all. However, depends on what you do in top half, it may or may not affect system performance.
I would suggest you write code for both cases and perform some profilings

Interrupt: An interrupt is an event that alter the sequence of instruction executed by a processor in corresponding to electrical signal generated by HW circuit both inside & outside CPU.
When any interrupt is generated it is handled by two halves.
1) Top Halves
2) Bottom Halves
Top Halves: Top halves executes as soon as CPU receives the interrupt. In the Top Halves Context Interrupt and Scheduler are disabled. This part of the code only contain Critical Code. Execution Time of this code should be as short as possible because at this time Interrupt are disabled, we don't want to miss out other interrupt by generated by the devices.
Bottom Halves: The Job of the Bottom half is used to run left over (deferred) work by the top halves. When this piece of code is being Executed interrupt is Enabled and Scheduler is Disabled. Bottom Halves are scheduled by Softirqs & Tasklets to run deferred work
Note: The top halves code should be as short as possible or deterministic time and should not contain any blocking calls as well.

Related

Can the same timer interrupt occur in parallel?

I implemented one timer interrupt handler in kernel module.
This timer interrupt handler requires about 1000us to run.
And I want this timer to trigger up every 10us.
(In doing so, I hope the same handler will be performed in parallel.)
(I know that this can create a tremendous amount of interrupt overhead, but I want to implement it for some testing.)
But this handler does not seem to run in parallel.
Timer interrupt seems to wait until the handler in progress is finished.
Can the same timer interrupt occur in parallel?
If not, is there a kernel mechanism that can run the same handler in parallel?
If the timer triggers every 10us, and requires 1000us (1ms) to complete, you would require 100 dedicated cpu's to barely keep up to the timers. The short answer is no, the interrupt system isn't going to support this. If an interrupt recursed, it would inevitably consume the interrupt handler stack.
Interrupts typically work by having a short bit of code be directly invoked when the interrupt asserts. If more work is to be done, this short bit would schedule a slower bit to follow on, and inhibit this source of interrupt. This is to minimize the latency caused by disparate devices seeking cpu attention. The slower bit, when it determines it has satiated the device request, can re-enable interrupts from this source.
[ In linux, the short bit is called the top half; the slower bit the bottom half. It is a bit confusing, because decades of kernel implementation pre-linux named it exactly the other way around. Best to avoid these terms. ]
One of many ways to get the effect you desire is to have this slow handler release a semaphore then re-enable the interrupt. You could then have an appropriate number of threads sit in a loop acquiring the semaphore then performing your task.

Function calling bottom half of interrupt handler in linux

In Linux, the handling of interrupt handler is divided into two components : top half, and bottom half.
From my understanding, the bottom-half of an interrupt handler can be handled in many ways : softirq, tasklet, work-queue, and timer-list.
I want to know which function(s) in the Linux kernel handle the schedule function of these bottom-halves.
Edit : I looked into the halndling of softirq's and tasklet's, and it seems that both of them are handled through the __do_softirq (http://lxr.linux.no/linux+v2.6.32.58/kernel/softirq.c#L207) function. However, I still see many paths inside the handler execution which pass through the schedule() function of the Linux kernel, and then show divergence. I am not able to explain these paths properly.
Some intuition for guiding you towards this function :
The scheduling of pending task (bottom-half) should be triggered by some event. Kernel events can either be a system call, or an interrupt.I think that the event which triggers a bottom half is an interrupt and not a system call.
As per my knowledge, this are the steps followed on arrival of an interrupt :
1. Interrupt arrives at core
2. Top half of the interrupt handler is run
3. Check the pending queue to see if there is a task which needs attention.
4. If there is any pending task, then execute it
I was going through the function list of all the OS handlers, and observed that the execution of many handlers passed through the schedule() function of the Linux kernel. Since this function is called so often from many interrupt handlers, I suppose that the bottom half of the interrupt handlers should be called from within this function only.
The schedule() function calls the post_schedule() function at the end. I tracked all the functions between this two function calls. There are many different function lists between them, raising suspicion that the bottom half functions must lie on the path from schedule() to post_schedule(). However, the sheer number of different MACROS and functions in the kernel is making it really difficult to pinpoint the function from where the scheduler jumps into the bottom half.
The top half of interrupt handler of the device driver must return IRQ_HANDLED, IRQ_WAKE_THREAD or IRQ_NONE to indicate to the interrupt sub-system that irq is handled or not. If IRQ_WAKE_THREAD is returned then the threaded bottom-half part of the interrupt handler is scheduled for execution. Normally bottom halves have higher priority over other normal kernel tasks. See https://lwn.net/Articles/302043/ for more details

Which context a given function is called in Linux Kernel

Is there a straight forward mechanism to identify if a given function is called in an interrupt context or from process context. This is the first part to the question. The second part is: How do I synchronize 2 processes, one which is in interrupt context and the other which is in process context. If my understanding is right, We cannot use mutexes for the process in interrupt context since it is not allowed to sleep. On the other hand, if I use spinlocks,the other process will use CPU cycles. What is the best way to synchronize these 2 processes. Correct me if my understanding is totally wrong.
You can tell if function was run as IRQ handler using in_irq() function. But I don't think it's a good practice to use it. You should see just from code in which context your function is being run. Otherwise I'd say your code has bad design.
As for synchronization mechanism -- you are right, you have to use spinlock, because you need to do synchronization in atomic context (e.g. interrupt) -- not that you have much of choice here. You are also right that much CPU cycles will be wasted when waiting for spinlock, so you should try and minimize amount of your code under lock.
Adding to Sam's answer - you should design your interrupt handler with bottom half and top half sections. This lets you have a minimal code (top half) in the interrupt handler (which you register when requesting the irq in the driver), and rest (bottom half) you can schedule using a work queue.
You can have this top half (where you are just handling the interrupt and doing some minimal red/writes from the device) inside atomic context protected by spinlock, so that less number of CPU cycles are wasted waiting for spinlock.

Clarification about the behaviour of request_threaded_irq

I have scoured the web, but haven't found a convincing answer to a couple of related questions I have, with regard to the "request_threaded_irq" feature.
Question1:
Firstly, I was reading this article, regarding threaded IRQ's:
http://lwn.net/Articles/302043/
and there is this one line that isn't clear to me:
"Converting an interrupt to threaded makes only sense when the handler
code takes advantage of it by integrating tasklet/softirq
functionality and simplifying the locking."
I understand had we gone ahead with a "traditional", top half/bottom half approach, we would have needed either spin-locks or disable local IRQ to meddle with shared data. But, what I don't understand is, how would threaded interrupts simplify the need for locking by integrating tasklet/softirq functionality.
Question2:
Secondly, what advantage (if any), does a request_threaded_handler approach have over a work_queue based bottom half approach ? In both cases it seems, as though the "work" is deferred to a dedicated thread. So, what is the difference ?
Question3:
Lastly, in the following prototype:
int request_threaded_irq(unsigned int irq, irq_handler_t handler, irq_handler_t thread_fn, unsigned long irqflags, const char *devname, void *dev_id)
Is it possible that the "handler" part of the IRQ is continuously triggered by the relevant IRQ (say a UART receving characters at a high rate), even while the "thread_fn"(writing rx'd bytes to a circular buffer) part of the interrupt handler is busy processing IRQ's from previous wakeups ? So, wouldn't the handler be trying to "wakeup" an already running "thread_fn" ? How would the running irq thread_fn behave in that case ?
I would really appreciate if someone can help me understand this.
Thanks,
vj
For Question 2,
An IRQ thread on creation is setup with a higher priority, unlike workqueues.
In kernel/irq/manage.c, you'll see some code like the following for creation of kernel threads for threaded IRQs:
static const struct sched_param param = {
.sched_priority = MAX_USER_RT_PRIO/2,
};
t = kthread_create(irq_thread, new, "irq/%d-%s", irq,
new->name);
if (IS_ERR(t)) {
ret = PTR_ERR(t);
goto out_mput;
}
sched_setscheduler_nocheck(t, SCHED_FIFO, &param);
Here you can see, the scheduling policy of the kernel thread is set to an RT one (SCHED_FIFO) and the priority of the thread is set to MAX_USER_RT_PRIO/2 which is higher than regular processes.
For Question 3,
The situation you described can also occur with normal interrupts. Typically in the kernel, interrupts are disabled while an ISR executes. During the execution of the ISR, characters can keep filling the device's buffer and the device can and must continue to assert an interrupt even while interrupts are disabled.
It is the job of the device to make sure the IRQ line is kept asserted till all the characters are read and any processing is complete by the ISR. It is also important that the interrupt is level triggered, or depending on the design be latched by the interrupt controller.
Lastly, the device/peripheral should have an adequately sized FIFO so that characters delivered at a high rate are not lost by a slow ISR. The ISR should also be designed to read as many characters as possible when it executes.
Generally speaking what I've seen is, a controller would have a FIFO of a certain size X, and when the FIFO is filled X/2, it would fire an interrupt that would cause the ISR to grab as much data as possible. The ISR reads as much as possible and then clears the interrupt. Meanwhile, if the FIFO is still X/2, the device would keep the interrupt line asserted causing the ISR to execute again.
Previously, the bottom-half was not a task and still could not block. The only difference was that interrupts were disabled. The tasklet or softirq allow different inter-locks between the driver's ISR thread and the user API (ioctl(), read(), and write()).
I think the work queue is near equivalent. However, the tasklet/ksoftirq has a high priority and is used by all ISR based functionality on that processor. This may give better scheduling opportunities. Also, there is less for the driver to manage; everything is already built-in to the kernel's ISR handler code.
You must handle this. Typically ping-pong buffers can be used or a kfifo like you suggest. The handler should be greedy and get all data from the UART before returning IRQ_WAKE_THREAD.
For Question no 3,
when an threadedirq is activated the corresponding interrupt line is masked / disabled. when the threadedirq runs and completes it enables it towards the end of the it. hence there won't be any interrupt firing while the respective threadedirq is running.
The original work of converting "hard"/"soft" handlers to threaded handlers was done by Thomas Gleixner & team when building the PREEMPT_RT Linux (aka Linux-as-an-RTOS) project (it's not part of mainline).
To truly have Linux run as an RTOS, we cannot tolerate a situation where an interrupt handler interrupts the most critical rt (app) thread; but how can we ensure that the app thread even overrides an interrupt?? By making it (the interrupt) threaded, schedulable (SCHED_FIFO) and have a lower priority than the app thread (interrupt threads rtprio defaults to 50). So a "rt" SCHED_FIFO app thread with a rtprio of 60 would be able to "preempt" (closely enough that it works) even an interrupt thread. That should answer your Qs. 2.
Wrt to Qs 3:
As others have said, your code must handle this situation.
Having said that, pl note that a key point to using a threaded handler is so that you can do work that (possibly) blocks (sleeps). If your "bottom half" work is guaranteed to be non-blocking and must be fast, pl use the traditional style 'top-half/bh' handlers.
How can we do that? Simple: don't use request_threaded_irq() just call request_irq() - the comment in the code clearly says (wrt 3rd parameter):
* #thread_fn: Function called from the irq handler thread
* If NULL, no irq thread is created"
Alternatively, you can pass the IRQF_NO_THREAD flag to request_irq.
(BTW, a quick check with cscope on the 3.14.23 kernel source tree shows that request_irq() is called 1502 times [giving us non-threaded interrupt handling], and request_threaded_irq() [threaded interrupts] is explicitly called 204 times).

Need help handling multiple shared I2C MAX3107 chips on shared ARM9 GPIO interrupt (linux)

Our group is working with an embedded processor (Phytec LPC3180, ARM9). We have designed a board that includes four MAX3107 uart chips on one of the LPC3180's I2C busses. In case it matters, we are running kernel 2.6.10, the latest version available for this processor (support of this product has not been very good; we've had to develop or fix a number of the drivers provided by Phytec, and Phytec seems to have no interest in upgrading the linux code (especially kernel version) for this product. This is too bad in that the LPC3180 is a nice device, especially in the context of low power embedded products that DO NOT require ethernet and in fact don't want ethernet (owing to the associated power consumption of ethernet controller chips). The handler that is installed now (developed by someone else) is based on a top-half handler and bottom-half work queue approach.
When one of four devices (MAX3107 UART chips) on the I2C bus receives a character it generates an interrupt. The interrupt lines of all four MAX3107 chips are shared (open drain pull-down) and the line is connected to a GPIO pin of the 3180 which is configured for level interrupt. When one of the 3017's generates an interrupt a handler is run which does the following processing (roughly):
spin_lock_irqsave();
disable_irq_nosync(irqno);
irq_enabled = 0;
irq_received = 1;
spin_unlock_irqrestore()
set_queued_work(); // Queue up work for all four devices for every interrupt
// because at this point we don't know which of the four
// 3107's generated the interrupt
return IRQ_HANDLED;
Note, and this is what I find somewhat troubling, that the interrupt is not re-enabled before leaving the above code. Rather, the driver is written such that the interrupt is re-enabled by a bottom half work queue task (using the "enable_irq(LPC_IRQ_LINE) function call". Since the work queue tasks do not run in interrupt context I believe they may sleep, something that I believe to be a bad idea for an interrupt handler.
The rationale for the above approach follows:
1. If one of the four MAX3107 uart chips receives a character and generates an interrupt (for example), the interrupt handler needs to figure out which of the four I2C devices actually caused the interrupt. However, and apparently, one cannot read the I2C devices from within the context of the upper half interrupt handler since the I2C reads can sleep, something considered inappropriate for an interrupt handler upper-half.
2. The approach taken to address the above problem (i.e. which device caused the interrupt) is to leave the interrupt disabled and exit the top-half handler after which non-interrupt context code can query each of the four devices on the I2C bus to figure out which received the character (and hence generated the interrupt).
3. Once the bottom-half handler figures out which device generated the interrupt, the bottom-half code disables the interrupt on that chip so that it doesn't re-trigger the interrupt line to the LPC3180. After doing so it reads the serial data and exits.
The primary problem here seems to be that there is not a way to query the four MAX3107 uart chips from within the interrupt handler top-half. If the top-half simply re-enabled interrupts before returning, this would cause the same chip to generate the interrupt again, leading, I think, to the situation where the top-half disables the interrupt, schedules bottom-half work queues and disables the interrupt only to find itself back in the same place because before the lower-half code would get to the chip causing the interrupt, another interrupt has occurred, and so forth, ....
Any advice for dealing with this driver will be much appreciated. I really don't like the idea of allowing the interrupt to be disabled in the top-half of the driver yet not be re-enabled prior to existing the top-half drive code. This does not seem safe.
Thanks,
Jim
PS: In my reading I've discovered threaded interrupts as a means to deal with the above-described requirements (at least that's my interpretation of web site articles such as http://lwn.net/Articles/302043/). I'm not sure if the 2.6.10 kernel as provided by Phytec includes threaded interrupt functions. I intend to look into this over the next few days.
If your code is written properly it shouldn't matter if a device issues interrupts before handling of prior interrupts is complete, and you are correct that you don't want to do blocking operations in the top half, but blocking operations are acceptable in a bottom half, in fact that is part of the reason they exist!
In this case I would suggest an approach where the top half just schedules the bottom half, and then the bottom half loops over all 4 devices and handles any pending requests. It could be that multiple devices need processing, or none.
Update:
It is true that you may overload the system with a load test, and the software may need to be optimized to handle heavy loads. Additionally I don't have a 3180, and four 3107s (or similar) of my own to test this out on, so I am speaking theoretically, but I am not clear why you need to disable interrupts at all.
Generally speaking when a hardware device asserts an interrupt it will not assert another one until the current one is cleared. So you have 4 devices sharing one int line:
Your top half fires and adds something to the work queue (ie triggers bottom half)
Your bottom half scans all devices on that int line (ie all four 3107s)
If one of them caused the interrupt you will then read all data necessary to fully process the data (possibly putting it in a queue for higher level processing?)
You "clear" the interrupt on the current device.
When you clear the interrupt then the device is allowed to trigger another interrupt, but not before.
More details about this particular device:
It seems that this device (MAX3107) has a buffer of 128 words, and by default you are getting interrupted after every single word. But it seems that you should be able to take better advantage of the buffer by setting the FIFO level registers. Then you will get interrupted only after that number of words has been rx (or if you fill your tx FIFO up beyond the threshold in which case you should slow down the transmit speed (ie buffer more in software)).
It seems the idea is to basically pull data off the devices periodically (maybe every 100ms or 10ms or whatever seems to work for you) and then only have the interrupt act as a warning that you have crossed a threshold, which might schedule the periodic function for immediate execution, or increases the rate at which it is called.
Interrupts are enabled & disabled because we use level-based interrupts, not edge-based. The ramifications of that are explicitly explained in the driver code header, which you have, Jim.
Level-based interrupts were required to avoid losing an edge interrupt from a character that arrives on one UART immediately after one arriving on another: servicing the first effectively eliminates the second, so that second character would be lost. In fact, this is exactly what happened in the initial, edge-interrupt version of this driver once >1 UART was exercised.
Has there been an observed failure with the current scheme?
Regards,
The Driver Author (someone else)

Resources