I have been working on irq handlers using the regmap irq chip implementation.
I have seen that there is high incosistency with the irq handlers execution. Especially if the irq is generated continuously during suspend. The irq chokes and never clears the interrupt source i.e handler never runs at times. Even if the handler runs half way and the system sleeps, it does not continue on resume.
Its creating serious issues. How do I handle this?
Regmap entirely uses threaded irqs. In addition, I was using i2c calls in the nested calls which are again threaded irqs. Due to this, I would always remain in user space and not in irq context. I2c tranfer has schedule in it and that brings in a completely different execution flow. In addition, there were problems in wake enabling the irq.
Related
say I have some code as follows:
local_irq_disable();
... // some interrupts come during this time
local_irq_enable();
after I called local_irq_enable(), all interrupts blocked(pending interrupts) are still there & cause the cpu to respond.
Is there anything will clear pending interrupts?
my code runs on an ARM aarch64 machine.
A typical chain is that the cpu interrupt pin is multiplexed via an interrupt controller (ex. GIC) to a set of devices.
Disabling interrupts merely shunts the pin on the CPU, the interrupt controller still maintains the pending state. You could use a feature on the interrupt controller to mask all interrupts, which would permit you to then enable the CPU interrupts without receiving any. Not really sure the point in that, when you could just leave the CPU ignoring interrupts.
To truly clear the pending interrupts, you need to invoke the device specific code (ie. interrupt handler) for each device with a pending interrupt. You could look through the status bits of the GIC, identify each pending interrupt, then look through the kernel's interrupt structure to determine the relevant device and invoke its handler. It is a lot easier to just turn interrupts back on.
If you disable interrupts, there will probably be a pending interrupt that's been sent to your CPU from the PIC that it's waiting for you to acknowledge. So before you re-enable interrupts, you'd have to tell the PIC to de-assert this interrupt if present.
While the PIC was waiting for acknowledgment, it may have been buffering other interrupts (or sending them to other CPUs). So you'd need to tell the PIC to clear these if present, or wait a sufficient amount of time for other CPUs to handle all these interrupts. This is assuming of course that the interrupts are being distributed evenly and no interrupt is biased towards your CPU.
What is chained IRQ ? What does chained_irq_enter and chained_irq_exit do, because after an interrupt is arised the IRQ line is disabled, but chained_irq_enter is calling functions related to masking interrupts. If the line is already disabled why to mask the interrupt ?
what is chained irq ?
There are two approaches how to call interrupt handlers for child devices in IRQ handler of parent (interrupt controller) device.
Chained interrupts:
"chained" means that those interrupts are just chain of function calls (for example, SoC's GPIO module interrupt handler is being called from GIC interrupt handler, just as a function call)
generic_handle_irq() is used for interrupts chaining
child IRQ handlers are being called inside of parent HW IRQ handler
you can't call functions that may sleep in chained (child) interrupt handlers, because they are still in atomic context (HW interrupt)
this approach is commonly used in drivers for SoC's internal GPIO modules
Nested interrupts
"nested" means that those interrupts can be interrupted by another interrupt; but they are not really HW IRQs, but rather threaded IRQs
handle_nested_irq() is used for creating nested interrupts
child IRQ handlers are being called inside of new thread created by handle_nested_irq() function; we need them to be run in process context, so that we can call sleeping bus functions (like I2C functions that may sleep)
you are able to call functions that may sleep inside of nested (child) interrupt handlers
this approach is commonly used in drivers for external chips, like GPIO expanders, because they are usually connected to SoC via I2C bus, and I2C functions may sleep
Speaking of drivers discussed above:
irq-gic driver uses CHAINED GPIO irqchips approach for handling systems with multiple GICs; this commit adds that functionality
gpio-omap driver (mentioned above) uses GENERIC CHAINED GPIO irqchips approach. See this commit. It was converted from using regular CHAINED GPIO irqchips so that on real-time kernel it will threaded IRQ handler, but on non-RT kernel it will be hard IRQ handler
'gpio-max732x' driver uses NESTED THREADED GPIO irqchips approach
what does chained_irq_enter and chained_irq_exit do
Those functions implement hardware interrupt flow control, i.e. notifying interrupt controller chip when to mask and unmask current interrupt.
For FastEOI interrupt controllers (most modern way):
chained_irq_enter() do nothing
chained_irq_exit() calls irq_eoi() callback to tell the interrupt controller that interrupt processing is finished
For interrupt controllers with mask/unmask/ack capabilities
chained_irq_enter() masks current interrupt, and acknowledges it if ack callback is set as well
chained_irq_exit() unmasks interrupt
because after an interrupt is arised the irq line is disabled, but chained_irq_enter is calling functions related to masking interrupts if the line is already disabled why to mask the interrupt ?
IRQ line is disabled. But we still need to write to EOI register in the end of interrupt processing. Or send ACK for edge-level interrupts.
This explains why interrupts are disabled in interrupt handler.
Read Linux kernel documentation itself for understanding these APIs:
https://www.kernel.org/doc/Documentation/gpio/driver.txt
CHAINED GPIO irqchips: these are usually the type that is embedded on
an SoC. This means that there is a fast IRQ handler for the GPIOs that
gets called in a chain from the parent IRQ handler, most typically the
system interrupt controller. This means the GPIO irqchip is registered
using irq_set_chained_handler() or the corresponding
gpiochip_set_chained_irqchip() helper function, and the GPIO irqchip
handler will be called immediately from the parent irqchip, while
holding the IRQs disabled. The GPIO irqchip will then end up calling
something like this sequence in its interrupt handler:
static irqreturn_t tc3589x_gpio_irq(int irq, void *data)
chained_irq_enter(...);
generic_handle_irq(...);
chained_irq_exit(...);
Are interrupts executed on all processors, or only on one?
For instance, when I type, do all processors handle the interrupt? Or only one of them and the rest carry on with other taks?
Here's a high-level view of the low-level processing. I'm describing a simple typical architecture, real architectures can be more complex or differ in ways that don't matter at this level of detail.
When an interrupt occurs, the processor looks if interrupts are masked. If they are, nothing happens until they are unmasked. When interrupts become unmasked, if there are any pending interrupts, the processor picks one.
Then the processor executes the interrupt by branching to a particular address in memory. The code at that address is called the interrupt handler. When the processor branches there, it masks interrupts (so the interrupt handler has exclusive control) and saves the contents of some registers in some place (typically other registers).
The interrupt handler does what it must do, typically by communicating with the peripheral that triggered the interrupt to send or receive data. If the interrupt was raised by the timer, the handler might trigger the OS scheduler, to switch to a different thread. When the handler finishes executing, it executes a special return-from-interrupt instruction that restores the saved registers and unmasks interrupts.
The interrupt handler must run quickly, because it's preventing any other interrupt from running. In the Linux kernel, interrupt processing is divided in two parts:
The “top half” is the interrupt handler. It does the minimum necessary, typically communicate with the hardware and set a flag somewhere in kernel memory.
The “bottom half” does any other necessary processing, for example copying data into process memory, updating kernel data structures, etc. It can take its time and even block waiting for some other part of the system since it runs with interrupts enabled.
I'm trying to better understand the interaction between the "return IRQ_HANDLED" statement used in a GPIO pin-based interrupt handler (top-half) and the GPIO pin hardware. In particular, consider the hypothetical situation wherein a device has pulled a GPIO pin low to indicate that it needs attention. This causes the associated (top half) interrupt handler to be invoked. Now assume that the top-half handler queues up some work and then returns with "return IRQ_HANDLED" but that for whatever reason the interrupt has not been cleared on the device that generated it (i.e. the device is holding the GPIO pin in the low state). Does invocation of "return IRQ_HANDLED" cause the interrupt to be regenerated? I ask this in the context of the following article:
http://www.makelinux.net/books/lkd2/ch06lev1sec4
"Reentrancy and Interrupt Handlers
Interrupt handlers in Linux need not be reentrant. When a given interrupt handler is executing, the corresponding interrupt line is masked out on all processors, preventing another interrupt on the same line from being received. Normally all other interrupts are enabled, so other interrupts are serviced, but the current line is always disabled. Consequently, the same interrupt handler is never invoked concurrently to service a nested interrupt. This greatly simplifies writing your interrupt handler."
The above comment indicates that upon invocation of an interrupt handler, the interrupt line for that interrupt is masked. I'm trying to figure out if the invocation of "return IRQ_HANDLED" is what unmasks the interrupt line. And, with respect to the hypothetical case described above, what would happen if I "return IRQ_HANDLED" yet the device has not really had its interrupt cleared and hence is still holding the GPIO pin in a low (triggered) state. More specifically, will this cause the interrupt to be generated again such that the processor never has a chance to do the work queued when the interrupt first occurred. I.e., would this lead to an interrupt storm wherein the processor could be interrupted endlessly thus not allowing any useful processing to occur. I should add that I ask this question in the context of a single CPU linux ARM9 system (Phytec LPC3180) running kernel 2.6.10.
Thanks in advance,
Jim
PS: I'm not clear as to the difference between enabling/disabling an interrupt (in particular, an interrupt associated with a particular GPIO pin) and masking/unmasking the same GPIO interrupt.
I am considering an upcoming situation in an embedded Linux project (no hardware yet) where two external chips will need to share a single physical IRQ line. This line is capable in hardware of edge triggering but not level triggered interrupts.
Looking at the shared irq support in Linux, I understand that the way this would work with two separate drivers is that each would have their interrupt handler called, check their hardware and handle if appropriate.
However I imagine the following race condition and would like to know if I'm missing something or what might be done to work around this. Let's say there are two external interrupt sources, devices A and B:
device B interrupt occurs, IRQ goes active
IRQ edge causes Linux core interrupt handler to run
ISR for device A runs, finds no interrupt pending
device A interrupt occurs, IRQ stays active (wire-OR)
ISR for device B runs, finds interrupt pending, handles and clears it
core interrupt handler exits
IRQ stays active, no more edges are generated, IRQ is locked up
It seems that for this to be fixed, the core interrupt handler would have to check the IRQ level after running all handlers, and if still active, run them all again. Will Linux do this? I don't think the interrupt core knows how to check the level of an IRQ line.
Is this race something that can actually happen, and if so how do I deal with this?
Basically, with the hardware you've described, doing a wired-or for the interrupts will NEVER work correctly on it's own.
If you want to do wired-or, you really need to be using level-sensitive IRQ inputs. If that's not feasible, then perhaps you can add in some kind of interrupt controller. That device would take N level-sensitive inputs, and have one output, and some kind of 'clear'. When the interrupt controller gets a clear it would lower it's output, then re-assert the output if any of it's inputs were still asserted.
On the software side, you could look at is running the IRQ line to another processor input. This would allow you to at least check the state, but the Linux core ISR handling isn't going to know anything about this, and so you'll have to patch in something to get it to check it and cycle through the ISRs again. Also, this means that in heavy interrupt loading situations you're NEVER going to get out of this ISR. Given that you're doing a wire-or on the IRQs, I'm kind of assuming these devices won't be interrupting too often.
One other thing is to look really hard at the processor. There may be some kind of trick you can pull with the interrupt setup in order to get it to recognize the interrupt again.
I wouldn't try anything too tricky myself, I'd either separate the sources onto separate IRQ inputs, change to a level-sensitive input, or add an interrupt controller chip.