Need help handling multiple shared I2C MAX3107 chips on shared ARM9 GPIO interrupt (linux) - linux

Our group is working with an embedded processor (Phytec LPC3180, ARM9). We have designed a board that includes four MAX3107 uart chips on one of the LPC3180's I2C busses. In case it matters, we are running kernel 2.6.10, the latest version available for this processor (support of this product has not been very good; we've had to develop or fix a number of the drivers provided by Phytec, and Phytec seems to have no interest in upgrading the linux code (especially kernel version) for this product. This is too bad in that the LPC3180 is a nice device, especially in the context of low power embedded products that DO NOT require ethernet and in fact don't want ethernet (owing to the associated power consumption of ethernet controller chips). The handler that is installed now (developed by someone else) is based on a top-half handler and bottom-half work queue approach.
When one of four devices (MAX3107 UART chips) on the I2C bus receives a character it generates an interrupt. The interrupt lines of all four MAX3107 chips are shared (open drain pull-down) and the line is connected to a GPIO pin of the 3180 which is configured for level interrupt. When one of the 3017's generates an interrupt a handler is run which does the following processing (roughly):
spin_lock_irqsave();
disable_irq_nosync(irqno);
irq_enabled = 0;
irq_received = 1;
spin_unlock_irqrestore()
set_queued_work(); // Queue up work for all four devices for every interrupt
// because at this point we don't know which of the four
// 3107's generated the interrupt
return IRQ_HANDLED;
Note, and this is what I find somewhat troubling, that the interrupt is not re-enabled before leaving the above code. Rather, the driver is written such that the interrupt is re-enabled by a bottom half work queue task (using the "enable_irq(LPC_IRQ_LINE) function call". Since the work queue tasks do not run in interrupt context I believe they may sleep, something that I believe to be a bad idea for an interrupt handler.
The rationale for the above approach follows:
1. If one of the four MAX3107 uart chips receives a character and generates an interrupt (for example), the interrupt handler needs to figure out which of the four I2C devices actually caused the interrupt. However, and apparently, one cannot read the I2C devices from within the context of the upper half interrupt handler since the I2C reads can sleep, something considered inappropriate for an interrupt handler upper-half.
2. The approach taken to address the above problem (i.e. which device caused the interrupt) is to leave the interrupt disabled and exit the top-half handler after which non-interrupt context code can query each of the four devices on the I2C bus to figure out which received the character (and hence generated the interrupt).
3. Once the bottom-half handler figures out which device generated the interrupt, the bottom-half code disables the interrupt on that chip so that it doesn't re-trigger the interrupt line to the LPC3180. After doing so it reads the serial data and exits.
The primary problem here seems to be that there is not a way to query the four MAX3107 uart chips from within the interrupt handler top-half. If the top-half simply re-enabled interrupts before returning, this would cause the same chip to generate the interrupt again, leading, I think, to the situation where the top-half disables the interrupt, schedules bottom-half work queues and disables the interrupt only to find itself back in the same place because before the lower-half code would get to the chip causing the interrupt, another interrupt has occurred, and so forth, ....
Any advice for dealing with this driver will be much appreciated. I really don't like the idea of allowing the interrupt to be disabled in the top-half of the driver yet not be re-enabled prior to existing the top-half drive code. This does not seem safe.
Thanks,
Jim
PS: In my reading I've discovered threaded interrupts as a means to deal with the above-described requirements (at least that's my interpretation of web site articles such as http://lwn.net/Articles/302043/). I'm not sure if the 2.6.10 kernel as provided by Phytec includes threaded interrupt functions. I intend to look into this over the next few days.

If your code is written properly it shouldn't matter if a device issues interrupts before handling of prior interrupts is complete, and you are correct that you don't want to do blocking operations in the top half, but blocking operations are acceptable in a bottom half, in fact that is part of the reason they exist!
In this case I would suggest an approach where the top half just schedules the bottom half, and then the bottom half loops over all 4 devices and handles any pending requests. It could be that multiple devices need processing, or none.
Update:
It is true that you may overload the system with a load test, and the software may need to be optimized to handle heavy loads. Additionally I don't have a 3180, and four 3107s (or similar) of my own to test this out on, so I am speaking theoretically, but I am not clear why you need to disable interrupts at all.
Generally speaking when a hardware device asserts an interrupt it will not assert another one until the current one is cleared. So you have 4 devices sharing one int line:
Your top half fires and adds something to the work queue (ie triggers bottom half)
Your bottom half scans all devices on that int line (ie all four 3107s)
If one of them caused the interrupt you will then read all data necessary to fully process the data (possibly putting it in a queue for higher level processing?)
You "clear" the interrupt on the current device.
When you clear the interrupt then the device is allowed to trigger another interrupt, but not before.
More details about this particular device:
It seems that this device (MAX3107) has a buffer of 128 words, and by default you are getting interrupted after every single word. But it seems that you should be able to take better advantage of the buffer by setting the FIFO level registers. Then you will get interrupted only after that number of words has been rx (or if you fill your tx FIFO up beyond the threshold in which case you should slow down the transmit speed (ie buffer more in software)).
It seems the idea is to basically pull data off the devices periodically (maybe every 100ms or 10ms or whatever seems to work for you) and then only have the interrupt act as a warning that you have crossed a threshold, which might schedule the periodic function for immediate execution, or increases the rate at which it is called.

Interrupts are enabled & disabled because we use level-based interrupts, not edge-based. The ramifications of that are explicitly explained in the driver code header, which you have, Jim.
Level-based interrupts were required to avoid losing an edge interrupt from a character that arrives on one UART immediately after one arriving on another: servicing the first effectively eliminates the second, so that second character would be lost. In fact, this is exactly what happened in the initial, edge-interrupt version of this driver once >1 UART was exercised.
Has there been an observed failure with the current scheme?
Regards,
The Driver Author (someone else)

Related

How OS handle input operations?

I learn how an OS system works and know that peripheral devices can send interrupts that OS handles then. But I don't have a vision of how actually it handles it.
What happens when I move the mouse around? Does it send interrupts every millisecond? How OS can handle the execution of a process and mouse positioning especially if there is one CPU? How can OS perform context switch in this case effectively?
Or for example, there are 3 launched processes. Process 1 is active, process 2 and process 3 are ready to go but in the pending state. The user inputs something with the keyboard in process 1. As I understand OS scheduler can launch process 2 or process 3 while awaiting input. I assume that the trick is in timings. Like the processor so fast that it's able to launched processes 2 and 3 between user's presses.
Also, I will appreciate any literature references where I could get familiar with how io stuff works especially in terms of timings and scheduling.
Let's assume it's some kind of USB device. For USB you have 2 layers of device drivers - the USB controller driver and the USB peripheral (keyboard, mouse, joystick, touchpad, ...) driver. The USB peripheral driver asks the USB controller driver to poll the device regularly (e.g. maybe every 8 milliseconds) and the USB controller driver sets that up and the USB controller hardware does this polling (not software/driver), and if it receives something from the USB peripheral it'll send an IRQ back to the USB controller driver.
When the USB controller sends an IRQ it causes the CPU to interrupt whatever it was doing and execute the USB controller driver's IRQ handler. The USB controller driver's IRQ handler examines the state of the USB controller and figures out why it sent an IRQ; and notices that the USB controller received data from a USB peripheral; so it determines which USB peripheral driver is responsible and forwards the received data to that USB peripheral's device driver.
Note: Because it's bad to spend too much time handling an IRQ (because it can cause the handling of other more important IRQs to be postponed) often there will be some kind separation between the IRQ handler and higher level logic at some point; which is almost always some variation of a queue where the IRQ handler puts a notification on a queue and then returns from the IRQ handler, and the notification on the queue causes something else to be executed run later. This might happen in the middle of the USB controller driver (e.g. USB controller driver's IRQ handler does a little bit of work, then creates a notification that causes the rest of the USB controller driver to do the rest of the work). There's multiple ways to implement this "queue of notifications" (deferred procedure calls, message passing, some other form of communication, etc) and different operating systems use different approaches.
The USB peripheral's device driver (e.g. keyboard driver, mouse driver, ...) receives the data sent by the USB controller's driver (that came from the USB controller that got it from polling the USB peripheral); and examines that data. Depending on what the data contains the USB peripheral's device driver will probably construct some kind of event describing what happened in a "standard for that OS" way. This can be complicated (e.g. involve tracking past state of the device and lookup tables for keyboard layout, etc). In any case the resulting event will be forwarded to something else (often a user-space process) using some form of "queue of notifications". This might be the same kind of "queue of notifications" that was used before; but might be something very different (designed to suit user-space instead of being designed for kernel/device drivers only).
Note: In general every OS that supports multi-tasking provides one or more ways that normal processes can use to communicate with each other; called "inter-process communication". There are multiple possibilities - pipes, sockets, message passing, etc. All of them interact with scheduling. E.g. a process might need to wait until it receives data and call a function (e.g. to read from a pipe, or read from a socket, or wait for a message, or ..) that (if there's no data in the queue to receive) will cause the scheduler to be told to put the task into a "blocked" state (where the task won't be given any CPU time); and when data arrives the scheduler is told to bump the task out of the "blocked" state (so it can/will be given CPU time again). Often (for good operating systems), whenever a task is bumped out of the "blocked" state the scheduler will decide if the task should preempt the currently running task immediately, or not; based on some kind of task/thread priorities. In other words; if a lower priority task is currently running and a higher priority task is waiting to receive data, then when the higher priority task receives the data it was waiting for the scheduler may immediately do a task switch (from lower priority task to higher priority task) so that the higher priority task can examine the data it received extremely quickly (without waiting for ages while the CPU is doing less important work).
In any case; the event (from the USB peripheral's device driver) is received by something (likely a process in user-space, likely causing that process to be unblocked and given CPU time immediately by the scheduler). This is the top of a "hierarchy/tree of stuff" in user-space; where each thing in the tree might look at the data it receives and may forward it to something else in the tree (using the same inter-process communication to forward the data to something else). For example; that "hierarchy/tree of stuff" might have a "session manager" at the top of the tree, then "GUI" under that, then several application windows under that. Sometimes an event will be consumed and not forwarded to something else (e.g. if you press "alt+tab" then the GUI might handle that itself, and the GUI won't forward it to the application window that currently has keyboard focus).
Eventually most events will end up at a normal application. Normal applications often have a language run-time that will abstract the operating systems details to make the application more portable (so that the programmer doesn't have to care which OS their application is running on). For example, for Java, the Java virtual machine might convert the operating system's event (that arrived in an "OS specific" format via. an "OS specific" communication mechanism) into a generic "KeyEvent" (and notify any "KeyListener").
The entire path (from drivers to a function/method inside an application) could involve many thousands of lines of code written by hundreds of people spread across many separate layers; where the programmers responsible for one piece (e.g. GUI) don't have to worry much about what the programmers working on other pieces (e.g. drivers) do. For this reason; you probably won't find a single source of information that covers everything (at all layers). Instead, you'll find information for device driver developers only, or information for C++ application developers only, or ...
This is also why nobody will be able to provide more than a generic overview (without any OS specific or "layer specific" details) - they'd have to write 12 entire books to provide an extremely detailed answer.

Postpone interrupt action

I'm trying to do something odd to which I've not found reference in the archives. On a Freescale iMX6 processor, there's an input line that generates an interrupt after being pressed (the 500mS delay does not work), the intent of which interrupt is to notify the system of a request for an orderly shutdown. On the system in question, the button attached is also attached to the Enter key GPIO. The generated interrupt appears to be a falling edge/rising edge (or vice versa, it matters not) separated by about 75mS or so. The interrupt does not repeat unless the key is released and pressed again.
The bit to clear the interrupt in the ISR is in a register allocated and held by the Real Time Clock driver (a side effect of the Freescale architecture) so I have to embed my interrupt handler inside the RTC driver, which of course has its own interrupt code.
I thought myself clever when I implemented the suggestion to question 18296686 regarding shutting down (embedded) Linux from kernel-space, but that fails to distinguish between Enter and power-off. I need to detect the power-off interrupt, wait ~750-1000mS, and check whether the button (the <Enter> key is attached to a GPIO) is still depressed, thus signalling a power-off.
I was thinking a poll(2) interface to the driver, but since the driver is really the RTC driver, the interface confuses me, and I'm looking for help in implementing this.

Why is the initial state of the interrupt flag of the 6502 a 1?

I'm emulating the 6502 processor, and I'm nearly finished (in the testing phase right now) and I'm using some NES test from the nesdev site, and it's telling me that both the interrupt flag and the unused 5th flag are supposed to be set to 1 initially (i.e. the disable interrupt), but why? I can understand the unused flag part, since it's... well... unused, but I don't understand the interrupt flag. I've tried searching on Google, and some sites confirm that it's supposed to be set to 1, but no one explains the reason behind this. Why are interrupts supposed to be blocked from the start of the program?
At power-up, the 'unused' bit in the Status Register is hardwired to logic '1' by the internal circuitry of the CPU. It can never be anything other than '1', since it is not controlled by any internal flag or register but is determined by a physical connection to a 'high' signal line.
The 'I' flag in the Status Register is initialised to '1' by the CPU Reset logic, and of course can be modified by the 'SEI' and 'CLI' program instructions as well as by the CPU itself (for example during IRQ processing). The reason that the default state is '1' (thus setting the Interrupt Disable flag) is so that the host system can execute startup/reset code without having to take account of, and arrange the servicing of, IRQ assertions.
Many 6502 host systems depend on some external trigger source for IRQ and NMI assertions - often this would be a VIA or CIA companion chip, specifically designed by MOS Technology as interface adapters featuring configurable timers and other event responders, created to work seamlessly with the 6502 to raise interrupts in response to predetermined hardware conditions. These companion chips themselves require some program-driven configuration in order to set them to a known state in order to begin watching for hardware events and raising interrupts accordingly.
Since these chips may be hardware-initialised to potentially indeterminate states, the 6502 does not want to begin servicing interrupts from them immediately as these interrupts could be completely spurious. By defaulting the 'I' flag to 'on', the CPU begins its' RESET program execution knowing that the software can initialise the rest of the host system - including support chips such as VIAs and CIAs - without the possibility of a spurious IRQ occurring before the entire system is in a state where they can be handled. As an example, consider a scenario where the CPU IRQ vector in ROM points to an indirection vector in RAM, which is initialised to an IRQ service routine address by the RESET code. If an IRQ were to occur before the RESET code has initialised the RAM vector, it would almost certainly be pointing to a random address (possibly but not guaranteed to be $0000) and it is entirely probable that a system crash would occur. With the 'I' flag set by default, IRQs cannot occur until the program issues 'CLI' which would be after the RAM vector address has been correctly initialised to point to a valid IRQ service routine.
If you study common examples of 6502 RESET code, you'll see the repeated theme of a suite of system-initialisation routines to set up the host environment (including support-chip timer registers for IRQ generation) followed by a 'CLI' instruction as one of the last things the code does. Most environments tend to be essentially IRQ-driven, doing their housework and service routines at precise intervals (e.g. once per video frame) so the RESET code ends with 'CLI' to denote that initialisation - including IRQ-generation setup - is complete and IRQ servicing can begin.
Now, having said all of that, what's to stop an NMI from being asserted at any point during RESET processing, hmm? The CPU will diligently suspend the RESET program and jump through the NMI ROM vector - and the 'I' flag has no effect (as you'd expect - NMI is Non Maskable and cannot be ignored). So, ironically, although the 'I' flag defaults to '1' in order to protect the RESET code from spurious or premature IRQs, there still and always exists the possibility of a spurious NMI which cannot be blocked, and could therefore engender the same problem if the vector points to RAM (either directly or indirectly).
It is the task of the programmer to find a way to manage such untimely NMIs in such a way that if they occur then they have no effect, or at least that they do not interfere with RESET processing. And therefore, arguably, if the software has to cater for that scenario, it's not much more effort to do the same for IRQ - meaning the defaulting of the 'I' flag to '1' could have been dropped from the CPU initialisation circuitry, or alternatively that NMIs should be hardwired to be ignored during RESET. But then, of course, they wouldn't be Non Maskable in all cases, and you'd need a special 'RESET' flag in the Status Register that you could clear to tell the CPU that RESET processing was complete and NMIs could now be serviced normally. But I digress. ;)
Usually a machine will need to set up its global state before it is safe to receive an interrupt. If interrupts were initially enabled then you would never know what had been initialised and what hadn't in your interrupt routine.
So it's about allowing a known state to be imposed before events start rolling in.
On the NES specifically it probably makes little difference — the built-in hardware generates non-maskable interrupts and doesn't do so until it is told to start. Most cartridges with standard interrupt-generating hardware also need to be told in advance to start generating interrupts and don't just do so from power on.
However this 6502 behaviour is generic to the part. An example problem they might be trying to avoid could be a system with a two-second startup time and a keyboard that generates interrupts. The interrupt routine might buffer the keystrokes. But if it tries to do that before the system is otherwise set up then it might end up writing bytes to a random location in memory.

Clarification about the behaviour of request_threaded_irq

I have scoured the web, but haven't found a convincing answer to a couple of related questions I have, with regard to the "request_threaded_irq" feature.
Question1:
Firstly, I was reading this article, regarding threaded IRQ's:
http://lwn.net/Articles/302043/
and there is this one line that isn't clear to me:
"Converting an interrupt to threaded makes only sense when the handler
code takes advantage of it by integrating tasklet/softirq
functionality and simplifying the locking."
I understand had we gone ahead with a "traditional", top half/bottom half approach, we would have needed either spin-locks or disable local IRQ to meddle with shared data. But, what I don't understand is, how would threaded interrupts simplify the need for locking by integrating tasklet/softirq functionality.
Question2:
Secondly, what advantage (if any), does a request_threaded_handler approach have over a work_queue based bottom half approach ? In both cases it seems, as though the "work" is deferred to a dedicated thread. So, what is the difference ?
Question3:
Lastly, in the following prototype:
int request_threaded_irq(unsigned int irq, irq_handler_t handler, irq_handler_t thread_fn, unsigned long irqflags, const char *devname, void *dev_id)
Is it possible that the "handler" part of the IRQ is continuously triggered by the relevant IRQ (say a UART receving characters at a high rate), even while the "thread_fn"(writing rx'd bytes to a circular buffer) part of the interrupt handler is busy processing IRQ's from previous wakeups ? So, wouldn't the handler be trying to "wakeup" an already running "thread_fn" ? How would the running irq thread_fn behave in that case ?
I would really appreciate if someone can help me understand this.
Thanks,
vj
For Question 2,
An IRQ thread on creation is setup with a higher priority, unlike workqueues.
In kernel/irq/manage.c, you'll see some code like the following for creation of kernel threads for threaded IRQs:
static const struct sched_param param = {
.sched_priority = MAX_USER_RT_PRIO/2,
};
t = kthread_create(irq_thread, new, "irq/%d-%s", irq,
new->name);
if (IS_ERR(t)) {
ret = PTR_ERR(t);
goto out_mput;
}
sched_setscheduler_nocheck(t, SCHED_FIFO, &param);
Here you can see, the scheduling policy of the kernel thread is set to an RT one (SCHED_FIFO) and the priority of the thread is set to MAX_USER_RT_PRIO/2 which is higher than regular processes.
For Question 3,
The situation you described can also occur with normal interrupts. Typically in the kernel, interrupts are disabled while an ISR executes. During the execution of the ISR, characters can keep filling the device's buffer and the device can and must continue to assert an interrupt even while interrupts are disabled.
It is the job of the device to make sure the IRQ line is kept asserted till all the characters are read and any processing is complete by the ISR. It is also important that the interrupt is level triggered, or depending on the design be latched by the interrupt controller.
Lastly, the device/peripheral should have an adequately sized FIFO so that characters delivered at a high rate are not lost by a slow ISR. The ISR should also be designed to read as many characters as possible when it executes.
Generally speaking what I've seen is, a controller would have a FIFO of a certain size X, and when the FIFO is filled X/2, it would fire an interrupt that would cause the ISR to grab as much data as possible. The ISR reads as much as possible and then clears the interrupt. Meanwhile, if the FIFO is still X/2, the device would keep the interrupt line asserted causing the ISR to execute again.
Previously, the bottom-half was not a task and still could not block. The only difference was that interrupts were disabled. The tasklet or softirq allow different inter-locks between the driver's ISR thread and the user API (ioctl(), read(), and write()).
I think the work queue is near equivalent. However, the tasklet/ksoftirq has a high priority and is used by all ISR based functionality on that processor. This may give better scheduling opportunities. Also, there is less for the driver to manage; everything is already built-in to the kernel's ISR handler code.
You must handle this. Typically ping-pong buffers can be used or a kfifo like you suggest. The handler should be greedy and get all data from the UART before returning IRQ_WAKE_THREAD.
For Question no 3,
when an threadedirq is activated the corresponding interrupt line is masked / disabled. when the threadedirq runs and completes it enables it towards the end of the it. hence there won't be any interrupt firing while the respective threadedirq is running.
The original work of converting "hard"/"soft" handlers to threaded handlers was done by Thomas Gleixner & team when building the PREEMPT_RT Linux (aka Linux-as-an-RTOS) project (it's not part of mainline).
To truly have Linux run as an RTOS, we cannot tolerate a situation where an interrupt handler interrupts the most critical rt (app) thread; but how can we ensure that the app thread even overrides an interrupt?? By making it (the interrupt) threaded, schedulable (SCHED_FIFO) and have a lower priority than the app thread (interrupt threads rtprio defaults to 50). So a "rt" SCHED_FIFO app thread with a rtprio of 60 would be able to "preempt" (closely enough that it works) even an interrupt thread. That should answer your Qs. 2.
Wrt to Qs 3:
As others have said, your code must handle this situation.
Having said that, pl note that a key point to using a threaded handler is so that you can do work that (possibly) blocks (sleeps). If your "bottom half" work is guaranteed to be non-blocking and must be fast, pl use the traditional style 'top-half/bh' handlers.
How can we do that? Simple: don't use request_threaded_irq() just call request_irq() - the comment in the code clearly says (wrt 3rd parameter):
* #thread_fn: Function called from the irq handler thread
* If NULL, no irq thread is created"
Alternatively, you can pass the IRQF_NO_THREAD flag to request_irq.
(BTW, a quick check with cscope on the 3.14.23 kernel source tree shows that request_irq() is called 1502 times [giving us non-threaded interrupt handling], and request_threaded_irq() [threaded interrupts] is explicitly called 204 times).

How shared IRQ races are avoided in Linux

I am considering an upcoming situation in an embedded Linux project (no hardware yet) where two external chips will need to share a single physical IRQ line. This line is capable in hardware of edge triggering but not level triggered interrupts.
Looking at the shared irq support in Linux, I understand that the way this would work with two separate drivers is that each would have their interrupt handler called, check their hardware and handle if appropriate.
However I imagine the following race condition and would like to know if I'm missing something or what might be done to work around this. Let's say there are two external interrupt sources, devices A and B:
device B interrupt occurs, IRQ goes active
IRQ edge causes Linux core interrupt handler to run
ISR for device A runs, finds no interrupt pending
device A interrupt occurs, IRQ stays active (wire-OR)
ISR for device B runs, finds interrupt pending, handles and clears it
core interrupt handler exits
IRQ stays active, no more edges are generated, IRQ is locked up
It seems that for this to be fixed, the core interrupt handler would have to check the IRQ level after running all handlers, and if still active, run them all again. Will Linux do this? I don't think the interrupt core knows how to check the level of an IRQ line.
Is this race something that can actually happen, and if so how do I deal with this?
Basically, with the hardware you've described, doing a wired-or for the interrupts will NEVER work correctly on it's own.
If you want to do wired-or, you really need to be using level-sensitive IRQ inputs. If that's not feasible, then perhaps you can add in some kind of interrupt controller. That device would take N level-sensitive inputs, and have one output, and some kind of 'clear'. When the interrupt controller gets a clear it would lower it's output, then re-assert the output if any of it's inputs were still asserted.
On the software side, you could look at is running the IRQ line to another processor input. This would allow you to at least check the state, but the Linux core ISR handling isn't going to know anything about this, and so you'll have to patch in something to get it to check it and cycle through the ISRs again. Also, this means that in heavy interrupt loading situations you're NEVER going to get out of this ISR. Given that you're doing a wire-or on the IRQs, I'm kind of assuming these devices won't be interrupting too often.
One other thing is to look really hard at the processor. There may be some kind of trick you can pull with the interrupt setup in order to get it to recognize the interrupt again.
I wouldn't try anything too tricky myself, I'd either separate the sources onto separate IRQ inputs, change to a level-sensitive input, or add an interrupt controller chip.

Resources