Handling IRQ delay in linux driver - linux

I've build a linux driver for an SPI device.
The SPI device sends an IRQ to the processor when a new data is ready to be read.
The IRQ fires about every 3 ms, then the driver goes to read 2 bytes with SPI.
The problem I have is that sometimes, there's more than 6 ms between the IRQ has been fired and the moment where SPI transfer starts, which means I lost 2 bytes of the SPI device.
In addition, there's a uncertain delay between the 2 bytes; sometime it's close to 0, sometime it's up to 300us..
Then my question is : how can I reduce the latency between IRQ and SPI readings ?
And how to avoid latency between the 2 bytes ?
I've tried compiling the kernel with premptive option, it does not change things that much.
As for the hardware, I'm using a mini2440 board running at 400 MHz, using a hardware SPI port (not i/o simulated SPI).
Thanks for help.
BR,
Vincent.

From the brochure of the Samsung S3C2440A CPU, the SPI interface hardware supports both interrupt and DMA-based operation. A look at the actual datasheet reveals that the hardware also supports a polling mode.
If you want to achieve high data rates reliably, the DMA-based approach is what you need. Once a DMA operation is configured, the hardware will move the data to RAM on its own, without the need for low-latency interrupt handling.
That said, I do not know the state of the Linux SPI drivers for your CPU. It could be a matter of missing support for DMA, of specific system settings or even of how you are using the driver from your own code. The details w.r.t. SPI are often highly dependent on the particular implementation...

I had a similar problem: I basically got an IRQ and needed to drain a queue via SPI in less than 10 ms or the chip would start to drop data. With high system load (ssh login was actually enough) sometimes the delay between the IRQ handler enqueueing the next SPI transfer with spi_async and the SPI transfer actually happening exceeded 11 ms.
The solution I found was the rt flag in struct spi_device (see here). Enabling that will set the thread that controls the SPI to real-time priority, which made the timing of all SPI transfers super reliable. And by the way that change also removes delay before the complete callback.
Just as a heads up, I think this was not available in earlier kernel versions.

The thing is Linux SPI stack uses queues for transmitting the messages.
This means that there is no guarantee about the delay between the moment you ask to send the SPI message, and the moment where it is effectively sent.
Finally, to fullfill my 3ms requirements between each SPI message, I had to stop using Linux SPI stack, and directly write into the CPU's register inside my own IRQ.
That's highly dirty, but it's the only way to make it work with small delays.

Related

should PciE reads (non posted) be done in napi context (sirq)

we have a custom driver to a FPGA that implements multiple devices. the network devices use NAPI. in the NAPI poll routine I have to read some registers from the FPGA.
We notice that we are spending a large amount of cpu time in sirq, and other device access is delayed.
My question is that since a read from the FPGA is a non posted read (requiring a wait for the returned data) is this violating the no block rule of the sirq context. Maybe the packet processing should be done in a tasklet ?
I found that if I moved one of the devices to it's own driver, and that device only writes to the FPGA (posted write) the performance of that device improves. I am being asked for an explanation of that result.

How to (almost) prevent FT232R (uart) receive data loss?

I need to transfer data from a bare metal microcontroller system to a linux PC with 2 MBaud.
The linux PC is currently running a 32 bit Kubuntu 14.04.
To archive this, I'd tried to use a FT232R based USB-UART adapter, but I sometimes observed lost data.
As long as the linux PC is mainly idle, it seems to work most time; however, I see rare data loss.
But when I force cpu load (e.g. rebuild my project), the data loss increases significantly.
After some research I read here, that the FT232R consist of a receive buffer with a capacity of only 384 Byte. This means, that the FT232R has to be read out (USB-polled) after at least every 1,9 ms. Well, FTDI recommends to use flow control, but because of the used microcontroller system, I'm fixed to cannot use any flow control.
I can live with the fact, that there is no absolutely guarantee for having no data loss. But the observed amount of data loss is quiet too heavy for my needs.
So I tried to find a way to increase the priority of the "FT232 driver" on my linux, but cannot find how to do this. It's not described in the
AN220 FTDI Drivers Installation Guide for Linux
and the document
AN107 FTDI Advanced Driver Options
has a capter about "Changing the Driver Priority" but only for windows.
So, does anybody know how to increase the FT232R driver priority in linux?
Any other ideas to solve this problem?
BTW: As I read the FT232H datasheet, it seems that this comes with 1 KiB RX buffer. I'd order one just now and check out its behaviour. Edit: No significant improvement.
If you want reliable data transfer, there is absolutely no way to use any USB-to-serial bridge correctly without hardware flow control, and without dedicating at least all remaining RAM in your microcontroller as the serial buffer (or at least until you can store ~1s worth of data).
I've been using FTDI devices since FT232AM was a hot new thing, and here's how I implement them:
(At least) four lines go between the bridge and the MCU: RXD, TXD, RTS#, CTS#.
Flow control is enabled on the PC side of things.
Flow control is enabled on the MCU side of things.
MCU code is only sending communications when it can fit a complete reply packet into the buffer. Otherwise, it lets the PC side of it time out and retry the request. For requests that stream data back, the entire frame is dropped if it can't fit in the transmit buffer at the time the frame is ready.
If you wish the PC to be reliably notified of new data, say every number of complete samples/frames, you must use event characters to flush the FTDI buffers to the hist, and encode your data. HDLC works great for that purpose and is documented in free standards (RFCs and ITU X and Q series - all free!).
The VCP driver, or the D2XX port bring-up is set up to have transfer sizes and latencies set for the needs of the application.
The communication protocol is framed, with CRCs. I usually use a cut-down version if X.25/Q.921/HDLC, limited to SNRM(E) mode for simple "dumb" command-and-respond devices, and SABM(E) for devices that stream data.
The size of FTDI buffers is immaterial, your MCU should have at least an order of magnitude more storage available to buffer things.
If you're running hard real-time code, such as signal processing, make sure that you account for the overhead of lots of transmit interrupts running "back-to-back". Once the FTDI device purges its buffers after a USB transfer, and indicates that it's ready to receive more data from your MCU, your code can potentially transmit a full FTDI buffer's worth of data at once.
If you're close to running out of cycles in your realtime code, you can use a timer as a source of transmit interrupts instead of the UART interrupt. You can then set the timer rate much lower than the UART speed. This allows you to pace the transmission slower without lowering the baudrate. If you're running in setup/preoperational mode or with lower real-time task load, you can then trivially raise the transmit rate without changing the baudrate. You can use a similar trick to pace the receives by flipping the RTS# output on the MCU under timer control. Of course this isn't a problem is you use DMA or a sufficiently fast MCU.
If you're out of timers, note that many other peripherals can also be repurposed as a source of timer interrupts.
This advice applies no matter what is the USB host.
Sidebar: Admittedly, Linux USB serial driver "architecture" is in the state of suspended animation as far as I can tell, so getting sensible results there may require a lot of work. It's not a matter of a simple kernel thread priority change, I'm afraid. Part of the reason is that funding for a lot of Linux work focuses on server/enterprise applications, and there the USB performance is a matter of secondary interest at best. It works well enough for USB storage, but USB serial is a mess nobody really cares enough to overhaul, and overhaul it needs. Just look at the amount of copy-pasta in that department...

Linux Network Driver MSI Interrupt Issue

I am attempting to create a network driver for custom hardware. I am targeting a Xilinx Zync-7000 FPGA device.
My issue is the software handling of the MSI interrupt on the CPU side. The problem I have is when the interrupt is fired on the PCIe device the driver code executes the interrupt handler one time and returns, but then the PCIe IO stops working and the MSI is reset when I look at lspci. Any future interrupts are not caught by the kernel and the PCIe dev is pretty much dead. I checked the hardware and no resets are issued to the FPGA so I am thinking that something is going on in the kernel.
Thank you in advance.
After posting this question I discovered the problem which has been plaguing me for a little over a day now. What was happening is when I mapped my DMA buffer as follows:
net_priv->rx_phy_addr = dma_map_single(&pdev->dev, net_priv->rx_virt_addr,
dev->mtu, PCI_DMA_FROMDEVICE);
I unmapped the same buffer later with
dma_unmap_single(&pdev->dev, net_priv->rx_phy_addr, BUFFER_SIZE,
PCI_DMA_FROMDEVICE);
My BUFFER_SIZE typo was 1MB in size and dev->mtu is 1.5kB. What seems to happen is that when I unmapped 1MB of space it started unmapping other memory maps in addition to the 1.5bkB. As soon as the dma_unmap_single completed the PCIe IO region was dead as well as the interrupt region. Hope my mistake can help someone else out.

Efficient detection if value at memory address has been changed?

This more a general question. Consider an external device. From time to time this device writes data via its device driver to a specific memory address. I want to write a small C program which read out this data. Is there a better way than just polling this address to check if the value has been changed? I want to keep the CPU load low.
I have done some further research
Is "memory mapped IO" an option? My naive idea is to let the external device writes a flag to a "memory mapped IO"-address which triggers a kernel device driver. The driver then "informs" the program which proceed the value. Can this work? How can a driver informs the program?
The answer may depend on what processor you intend to use, what the device is and possibly whether you are using an operating system or RTOS.
Memory mapped I/O per se is not a solution, that simply refers to I/O device registers that can be directly addressed via normal memory access instructions. Most devices will generate an interrupt when certain registers are updated or contain new valid data.
In general if using an RTOS you can arrange for the device driver to signal via a suitable IPC mechanism any client thread(s) that need to handle the data. If you are not using an RTOS, you could simply register a callback with the device driver which it would call whenever the data is updated. What the client does in the call back is its business - including reading the new data.
If the device in question generates interrupts, then the handling can be done on interrupt, if the device is capable of DMA, then it can handle blocks of data autonomously before teh DMA controller generates an DMA interrupt to a handler.

Critical Timing in an ARM Linux Kernel Driver

I am running linux on an MX28 (ARMv5), and am using a GPIO line to talk to a device. Unfortunately, the device has some special timing requirements. A low on the GPIO line cannot last longer than 7us, highs have no special timing requirements. The code is implemented as a kernel device driver, and toggles the GPIO with direct register writes rather than going through the kernel GPIO api. For testing, I am just generating 3 pulses. The process is as follows, all in one function so it should fit in the instruction cache:
set gpio high
Save Flags & Disable Interrupts
gpio low
pause
gpio high
repeat 2x more
Restore Flags/Reenable Interrups
Here's the output of a logic analyzer tied to the GPIO.
Most of the time it works just great, and the pulses last just under 1us. However, about 10% of the lows last for many, many microseconds. Even though interrupts are disabled, something is causing the flow of the code to be interrupted.
I am at a loss. RT Linux would likely not help here, because the problem is not latency, it appears to be something happening during the low, even though nothing should interrupt it with the IRQs disabled. Any suggestions would be greatly, greatly appreciated.
The ARM cache on an IMX25 (ARM926) is 16K Code, 16K Data L1 with a 32byte length or eight instructions. With the DDR-SDRAM controller running at 133Mhz and a 16bit bus the transfer rate is about 300MB/s. A cache fill should only take about 100nS, not 9uS; this is about 100 times too long.
However, you have four other issues with Linux.
TLB misses and a page table walk.
Data aborts.
DMA masters stealing.
FIQ interrupts.
It is unlikely that the LCD master is stealing enough bandwidth, unless you have a huge display. Is your display larger than 1/4VGA? If not, this is only 10% of the memory bandwidth and this will pipeline with the processor. Do you have either Ethernet or USB active? These peripherals are higher data rate and could cause this type of contention with SDRAM.
All of these issues maybe avoided by writing your toggler PC relative and copying it to the IRAM. See: iram_alloc.c; this file should be portable to older versions of Linux. The XBAR switch allows fetches from SDRAM and IRAM simultaneously. The IRAM can still be a target of other DMA masters. If you are really pressed, move the code to the ETB buffers which no other master in the system can access.
The TLB miss can actually be quite steep as it may need to run several single beat SDRAM cycles; still this should be under 1uS. You have not posted code, so it is possible that a variable and/or other is causing a data fault which is not maskable.
If you have any drivers using the FIQ, they may still be running even though you have masked the normal IRQ interrupts. For instance, the ALSA driver for this system normally uses the FIQ.
Both the ETB and the IRAM are 32-bit data paths and low wait state. Either one will probably give better response than the DDR-SDRAM.
We have achieved sub micro-second response by using a FIQ and IRAM to toggle GPIOs on an IMX258 with another protocol using bit banging.
One possible workaround to the problem Chris mentioned (in addition to problems with paging of kernel module code) is to use a PWM peripheral where the duration of the pulse is pre-programmed and the timing is implemented in hardware.
Fancy processors with caches are not suitable for hard realtime work. Execution time varies if cache misses are non-deterministic (and designs where cache misses are completely deterministic aren't complicated enough to justify a fancy processor).
You can try to avoid memory controller latency during critical sections by aligning the critical section so that it doesn't straddle cache lines. Or prefetch the code you will need. But this is going to be very non-portable and create a nightmare for future maintenance. And still doesn't protect the access to memory-mapped GPIO from bus contention.

Resources